巴西专利BR112019010875A2 signaling systems and methods of regions of interest

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
These are techniques and systems for processing video data. In one example, you can get a media file associated with 360 degree video data. 360 degree video data can include a spherical representation of a scene. The media file may include first cue information and second cue information from a viewport region corresponding to a region of interest (roi) in the spherical representation. Early signaling information may include a center position and a dimension of the viewport region measured in a spherical space associated with the spherical representation. The second signaling information may indicate a region of a figure comprising the viewing window region, and the figure is formed by projecting the spherical representation that includes the roi in a plane. pixels corresponding to the picture data viewport region may be extracted based on the first flagging information and the second flagging information and may be provided for rendering.
公开号:BR112019010875A2
申请号:R112019010875
申请日:2017-12-01
公开日:2019-10-01
发明作者:Van Der Auwera Geert；Wang Yekui
申请人:Qualcomm Inc；
IPC主号:

专利说明:

SYSTEMS AND METHODS OF SIGNALING REGIONS OF INTEREST
FIELD [001] This application relates to video encoding and compression. More specifically, the present application refers to systems and methods for generating and processing files to signal regions of interest.
FUNDAMENTALS [002] Many devices and systems allow video data to be processed and sent for consumption. Digital video data includes large amounts of data to meet the demand of consumers and video providers. For example, consumers of video data want the best quality video, with high fidelity, resolutions, frame rates and the like. As a result, the large amount of video data that is required to meet these demands weighs heavily on communication networks and the devices that process and store video data.
[003] Various video encoding techniques can be used to compress video data. Video encoding is performed according to one or more video encoding standards. For example, video encoding standards include high efficiency video encoding (FIEVC), advanced video encoding (AVC), motion picture expert group encoding (MPEG) or the like. Video encoding generally uses prediction methods (for example, interpredition, intraprediction or the like) that take
Petition 870190049772, of 05/28/2019, p. 20/180
2/126 redundancy advantage present in images or video sequences. An important goal of video encoding techniques is to compress video data in a way that facilitates both the transmission of video data and the rendering of video data.
BRIEF SUMMARY [004] In some examples, in this document, techniques and systems for generating media files for 360-degree video content are described to include signaling information from one or more regions of interest (ROIs) in the video content in 360 degrees. This document also describes techniques and systems for processing the signaling information included in the media files to extract one or more ROIs from the video content for rendering. The video content in 360 degrees can be a spherical video formed by composing a set of images that capture a scene at a certain moment in time. A 360 degree video ROI can be a predetermined region of the figure that captures a particular portion of the scene (for example, a region based on a director's cut to direct the audience's view, a region that is statistically more prone to be rendered to a user at the time of presentation of the figuration or other predetermined region of interest). ROI can also be determined dynamically based, for example, on the viewer's orientation. Signaling information can be used for various purposes, such as for obtaining data in continuous transmission
Petition 870190049772, of 05/28/2019, p. 21/180
3/126 adaptive 360-degree video, for transcoding optimization when a 360-degree video is transcoded, for cache management, to facilitate 360-degree video rendering, among others.
[005] Media files can include any suitable streaming media file, such as a media presentation description (MPD) used for adaptive bit rate streaming according to Dynamic Adaptive Streaming via Hypertext Transfer Protocol (HTTP) (known as DASH), or other suitable file according to any other suitable adaptive transmission protocol.
[006] In some examples, a method for processing video data is provided. The method can comprise obtaining a media file associated with 360 degree video data, the 360 degree video data including a spherical representation of a scene, the media file includes first signaling information and second signaling information from a viewing window region corresponding to a region of interest (ROI) in the spherical representation, the first signaling information including a central position and a dimension of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a figure comprising the viewing window region, the figure being formed by projecting the spherical representation
Petition 870190049772, of 05/28/2019, p. 22/180
4/126 which includes ROI in a plan; The method may further comprise extracting pixels corresponding to the display window region of the figuration data based on the first signaling information and the second signaling information and providing the pixels to render the viewing window region for display.
[007] In some respects, the first signaling information may include a first angle and a second angle of a center of the viewing window region with respect to a spherical center of the spherical representation of the scene, the first angle being formed in a first plane and the second angle is formed in a second plane, where the first plane is perpendicular to the second plane.
[008] In some respects, the first signaling information may additionally include a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region.
[009] In some respects, the third angle can be formed between a first edge and a second edge of the viewing window region; and the fourth angle is formed between a third edge and a fourth edge of the viewing window region.
[0010] In some respects, ROI can be defined by at least four planes that intersect with spherical representation; and where each of the four planes also intersects with the spherical center. In some ways,
Petition 870190049772, of 05/28/2019, p. 23/180
5/126 the shape of the viewing window region can be determined based on the intersection of at least four planes with the spherical representation. In some ways, the pixels corresponding to the viewing window region are extracted based on the format.
[0011] In some aspects, the figuration may include a plurality of mosaics. The second signaling information can define one or more mosaics of the figuration that includes the viewing window region. In some respects, the method may further comprise containing one or more mosaics from the plurality of mosaics based on the second signaling information and extracting the pixels from the one or more mosaics.
[0012] In some respects, the second signaling information may include one or more coordinates associated with one or more mosaics in the figuration. The one or more tiles can form a tile group, and the second signaling information can include a group identifier associated with the tile group. In some respects, the plurality of mosaics are mosaics of limited movement.
[0013] In some respects, the second signaling information may include pixel coordinates associated with a predetermined location within a viewport region formed by projecting the ROI on a plane, a width of the viewport region and a height of the viewing window region. The media file can be based on a base Organization media file format
Petition 870190049772, of 05/28/2019, p. 24/180
6/126
International Standardization (ISO) (ISOBMFF). The media file can identify a sample group that includes a video sample corresponding to the spherical video scene; and where the first signaling information and the second signaling information are included in one or more sample group syntax elements.
[0014] In some respects, the media file may be based on a media presentation description (MPD) format and includes one or more sets of adaptations. Each of the one or more sets of adaptations may include one or more representations. The first signaling information, the second signaling information and a link to the figure can be included in one or more elements associated with the ROI included in one or more representations. In some aspects, the method may additionally comprise obtaining the figuration based on the link included in the media file.
[0015] In some respects, the one or more representations can be representations based on mosaic, and the second signaling information can include identifiers associated with mosaics that include the ROI included in one or more representations based on mosaic.
[0016] In some respects, the spherical representation of the scene can be projected onto the plane with the use of rectilinear projection.
[0017] In some respects, the method may additionally comprise extracting pixels from multiple ROIs of the figuration based on the first information from
Petition 870190049772, of 05/28/2019, p. 25/180
7/126 signaling and the second signaling information.
[0018] In some examples, a device for processing video data is provided. The apparatus may comprise a memory configured to store 360-degree video data and a processor configured to: obtain a media file associated with 360-degree video data, the 360-degree video data including a spherical representation of a scene, the media file includes first signaling information and second signaling information from a viewing window region corresponding to a region of interest (ROI) in the spherical representation, the first signaling information including a central position and a dimension of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a picture that comprises the viewing window region, and the picture is formed by projecting the spherical representation that includes ROI in a plan; The processor can be further configured to extract pixels corresponding to the display window region of the figuration data based on the first signaling information and the second signaling information and supply the pixels to render the viewing window region for display.
[0019] In some aspects, the processor is additionally configured to determine, from the first signaling information, a first angle and a second angle of a center of the window region of
Petition 870190049772, of 05/28/2019, p. 26/180
8/126 visualization with respect to a spherical center of the spherical representation of the scene, with the first angle being formed in the foreground and the second angle being formed in the background, in which the foreground is perpendicular to the background.
[0020] In some aspects, the processor is additionally configured to determine, from the first signaling information, a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region .
[0021] In some respects, the third angle is formed between a first edge and a second edge of the viewing window region; and where the fourth angle is formed between a third edge and a fourth edge of the viewing window region. In some respects, ROI is defined by at least four planes that intersect with spherical representation; and where each of the four planes also intersects with the spherical center.
[0022] In some aspects, the processor is additionally configured to determine a format of the viewing window region based on the intersections of at least four planes with the spherical representation.
[0023] In some aspects, the processor is configured to extract the pixels corresponding to the viewing window region based on the format.
[0024] In some respects, the figuration may include a plurality of mosaics, and the second signaling information may define one or more
Petition 870190049772, of 05/28/2019, p. 27/180
9/126 figurative mosaics that include the viewing window region. The processor is further configured to obtain the one or more tiles from the plurality of tiles based on the second signaling information and extract the pixels from the one or more tiles.
[0025] In some aspects, the processor is additionally configured to determine, based on the second signaling information, one or more coordinates associated with one or more mosaics in the picture.
[0026] In some respects, the one or more mosaics form a group of mosaics. The processor is additionally configured to determine, from the second signaling information, a group identifier associated with the tile group. In some respects, the plurality of mosaics are mosaics of limited movement.
[0027] In some aspects, the processor is additionally configured to determine, from the second signaling information, pixel coordinates associated with a predetermined location within a viewport region formed projecting the ROI on a plane, a width of the viewport region and a height of the viewport region.
[0028] In some respects, the media file is based on a base media file format from the International Organization for Standardization (ISO) (ISOBMFF). The media file can identify a sample group that includes a video sample corresponding to the video
Petition 870190049772, of 05/28/2019, p. 28/180
10/126 spherical scene; and wherein the processor is additionally configured to extract the first signaling information and the second signaling information FROM one or more syntax elements of the sample group.
[0029] In some respects, the media file is based on a media presentation description (MED) format and includes one or more sets of adaptations. Each of the one or more sets of adaptations may include one or more representations. The processor is additionally configured to determine, based on one or more elements associated with the ROI included in one or more representations, the first signaling information, the second signaling information and a link to the figure; and obtain the figuration based on the link included in the media file.
[0030] In some respects, the one or more representations are representations based on mosaic. The processor is further configured to determine, based on the second signaling information, identifiers associated with mosaics that include the ROI included in one or more mosaic-based representations.
[0031] In some respects, the spherical representation of the pen scene is projected onto the plane using a rectilinear projection.
[0032] In some aspects, the processor is additionally configured to extract pixels from multiple ROIs of the figure based on the first signaling information and the second signaling information.
[0033] In some ways, the device can
Petition 870190049772, of 05/28/2019, p. 29/180
11/126 comprise a mobile device with one or more cameras to capture video data in 360 degrees. In some respects, the device may comprise a display to render the viewing window region.
[0034] In some examples, a non-transitory, computer readable media is provided. The non-transitory computer-readable media can be stored in the same instructions that, when executed by one or more processors, make the one or more processor: obtain a media file associated with 360 degree video data, in which the video data 360 degrees include a spherical representation of a scene, and the media file includes first signaling information and second signaling information from a viewing window region corresponding to a region of interest (ROI) in the spherical representation, with the The first signaling information includes a central position and a dimension of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a picture that comprises the viewing window region, where the figuration is formed projecting the spherical representation that and includes ROI in a plan; extracts pixels corresponding to the display data region of the figuration data based on the first signaling information and the second signaling information; and provide the pixels to render the viewport region for display.
[0035] In some examples, a method for
Petition 870190049772, of 05/28/2019, p. 30/180
12/126 processing video data is provided. The method may comprise: obtaining the 360 degree video data, the 360 degree video data including a spherical representation of a scene; determine a region of interest (ROI) in the spherical representation of the scene; generate a media file that includes first signaling information and second signaling information from a viewport region corresponding to the ROI, where the first signaling information includes a central position and a viewport region dimension measured in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a figure comprising the viewing window region, the figure being formed by projecting the spherical representation that includes the ROI on a plane; and providing the media file to render the 360 degree video data or to transmit a portion of the 360 degree video data that includes at least the ROI.
[0036] In some examples, a device for processing video data is provided. The apparatus may comprise a memory configured to store 360 degree video data and a processor configured to: obtain 360 degree video data, the 360 degree video data including a spherical representation of a scene; determine a region of interest (ROI) in the spherical representation of the scene; generate a media file that includes first signaling information and second signaling information for a window region of
Petition 870190049772, of 05/28/2019, p. 1/31
13/126 visualization corresponding to the ROI, with the first signaling information including a central position and a dimension of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information indicating a region of a figuration which comprises the viewing window region, and the figure is formed by projecting the spherical representation that includes the ROI in a plane; and providing the media file to render the 360 degree video data or to transmit a portion of the 360 degree video data that includes at least the ROI.
[0037] In some examples, non-transitory, computer-readable media is provided. Non-transitory computer-readable media can be stored in the same instructions that, when executed by one or more processors, make one or more processors: obtain 360 degree video data, and 360 degree video data includes a representation spherical of a scene; determine a region of interest (ROI) in the spherical representation of the scene; generate a media file that includes first signaling information and second signaling information from a viewport region corresponding to the ROI, with the first signaling information including a central position and a viewport region dimension measured in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a picture that comprises the viewing window region, the picture being formed by projecting
Petition 870190049772, of 05/28/2019, p. 32/180
12/146 the spherical representation that includes the ROI in a plan; and providing the media file for rendering the 360 degree video data or for transmitting a portion of the 360 degree video data including at least the ROI.
[0038] This summary is not intended to identify major or essential attributes of the claimed matter, nor is it intended to be used in isolation to determine the scope of the claimed matter. The matter should be understood as a reference to the appropriate portions of the entire specification of the present patent, any or all of the drawings and each claim.
[0039] The aforementioned, along with other attributes and modalities, will become more evident by reference to the specification, claims and attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS [0040] The illustrative embodiments of the present invention are described in detail below with reference to the following Figures with drawings:
[0041] Figure 1 is a block diagram illustrating an example of an encoding device and a decoding device, in accordance with some examples;
[0042] Figure 2A and Figure 2B are diagrams that illustrate examples of video frames captured by omnidirectional cameras that use a fisheye lens to capture a wide field of view, in accordance with some examples;
Petition 870190049772, of 05/28/2019, p. 33/180
15/126 [0043] Figure 3 is a diagram that illustrates an example of an equirectangular video frame, according to some examples;
[0044] Figure 4A, Figure 4B, Figure 4C, Figure 4D and Figure 4E are diagrams that illustrate examples of a video frame and equirectangular signaling of a viewing window corresponding to a region of interest (ROI) in the video frame, according to some examples;
[0045] Figure 5A, Figure 5B and Figure 5C are diagrams that illustrate examples of a viewing window and definitions of an ROI, in accordance with some examples;
[0046] Figure 6A and Figure 6B illustrate two-dimensional video frames and signaling a viewing window for an ROI on two-dimensional video frames;
[0047] Figure 7 and Figure 8 provide examples of a media file that contains signaling information from a viewing window, in accordance with some examples;
[0048] Figure 9 is a diagram that illustrates a continuous video transmission system, according to some examples;
[0049] Figure 10 provides a graphical representation of an example of an MPD file, in accordance with some examples;
[0050] Figure 11 is a representation of XML code that illustrates an example of signaling a
Petition 870190049772, of 05/28/2019, p. 34/180
16/126 viewing window corresponding to an ROI in the MPD file, in accordance with some examples;
[0051] Figure 12 and Figure 13 are flowcharts that illustrate exemplary processes for processing video data, in accordance with some examples;
[0052] Figure 14 is a block diagram illustrating an example video coding device, in accordance with some examples; and [0053] Figure 15 is a block diagram illustrating an exemplary video decoding device, in accordance with some examples.
DETAILED DESCRIPTION [0054] Certain aspects and modalities of the present disclosure are provided below. Some of these aspects and modalities can be applied independently and some of them can be applied in combination, as is evident to people skilled in the art. In the following description, by way of explanation, specific details are presented in order to provide a thorough understanding of the modalities of the invention. However, it will be evident that several modalities can be practiced without these specific details. Figures and description are not intended to be limiting.
[0055] The expedition description provides exemplary modalities only and is not intended to limit the scope, applicability or configuration of the disclosure. Preferably, the shipping description of the
Petition 870190049772, of 05/28/2019, p. 35/180
17/126 exemplary modalities of the exemplary modalities will provide people skilled in the art with a viable description to implement an exemplary modality. It should be understood that several changes can be made in the function and layout of the elements without departing from the spirit and scope of the invention, as presented in the attached claims.
[0056] Specific details are provided in the following description of the modalities. However, it will be understood by a person skilled in the art that the modalities can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram so as not to complicate the modalities with unnecessary details. In other examples, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details in order to avoid the incompressibility of the modalities.
[0057] Furthermore, it turns out that individual modalities can be described as a process that is portrayed as a flow chart, a flow diagram, a data flow diagram, a structure diagram or a block diagram. Although a flowchart can describe operations as a sequential process, many operations can be performed in parallel or concurrently. In addition, the order of operations can be rearranged. A process is terminated when its operations are completed, however there could be additional steps included
Petition 870190049772, of 05/28/2019, p. 36/180
18/126 in a Figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the termination of it corresponds to a return of the function to the calling function or main function.
[0058] The term computer-readable media includes, but is not limited to, portable or portable storage devices, optical storage devices and various other media capable of storing, containing or carrying instruction (or instructions) and / or data. Computer-readable media may include non-transitory media on which data can be stored and this does not include carrier waves and / or transient electronic signals that propagate wirelessly or over wired connections. Examples of non-transitory media may include, but are not limited to, a magnetic disc or tape, storage media such as compact disc (CD) or digital versatile disc (DVD), flash memory, memory or memory devices. A computer-readable medium may have code and / or machine executable instructions stored in the same instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class or any combination of instructions, data structures or program declarations. A code segment can be coupled to another code segment or a hardware circuit passing and / or receiving information, data, arguments, parameters or memory content. Information, arguments, parameters,
Petition 870190049772, of 05/28/2019, p. 37/180
19/126 data etc. they can be passed, forwarded or transmitted by any suitable means including memory sharing, message passing, token passing, network transmission or the like.
[0059] In addition, the modalities can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages or any combination thereof. When deployed in software, firmware, middleware or microcode, the code or segments of program code to perform the necessary tasks (for example, a computer program product) can be stored on a machine-readable, machine-readable media. A processor (or processors) can perform the necessary tasks.
[0060] Video content can be captured and encoded as 360 degree video content (also called virtual reality (VR) content). As described in more detail below, one or more systems and methods described in this document are intended to generate media files for 360 degree video content to include signaling information from one or more regions of interest (ROIs) in the video content. One or more systems and methods described in this document are also intended to process the signaling information included in the media files to extract the ROI from the video content for rendering. The video content can be a spherical video formed by composing a set of images that capture a scene at certain moments in time. An ROI of a video figuration in
Petition 870190049772, of 05/28/2019, p. 38/180
12/20
360 degrees can be a predetermined region of the figuration that captures a certain portion of the scene. In some cases, an ROI may correspond to a dynamically determined portion of the scene (for example, a portion of the scene currently viewed by the user). A media file can include first signaling information and second ROI signaling information. The first signaling information can include a first ROI location and ROI dimension information in a corresponding three-dimensional spherical space for the spherical video. Second signaling information can include a second ROI location in a two-dimensional space formed by projecting the spherical space onto a plane. In some examples, double flagging can provide a mapping between the first location and the second location of the ROI. Mapping can facilitate both the transmission and rendering of spherical video data.
[0061] 360 degree video can include virtual reality video, augmented reality data or any other type of 360 degree video content, whether captured, computer generated or similar. For example, 360-degree video can provide the ability to be virtually present in a non-physical world created by rendering natural and / or synthetic images (and, in some cases, sound) correlated with the immersed user's movements, which allows the user to interact with the world. 360-degree video can represent a three-dimensional environment with which you can interact in a way
Petition 870190049772, of 05/28/2019, p. 39/180
21/126 fully integrated real or physically. In some cases, a user who has experience of a 360-degree video environment uses electronic equipment, such as a helmet monitor (HMD) and, optionally, certain tools or clothing (for example, gloves equipped with sensors) to interact with the virtual environment. As the user moves in the real world, the images rendered in the virtual environment also change, which provides the user with the perception that the user moves within the virtual environment. In some cases, the virtual environment includes sound that correlates with the user's movements, which gives the user the impression that the sounds originate from a particular direction or source. 360 degree video can be captured and rendered in very high quality, which potentially provides a 360 degree video or truly immersive virtual reality experience. 360 degree video applications include electronic games, training, education, sports video, online shopping and more.
[0062] 360 degree video is a video captured for viewing in a 360 degree environment. In some applications, real-world video can be used to present a virtual reality environment, as opposed to computer-generated graphics, such as can be found in electronic or virtual game worlds. In these applications, a user can experience another location in the same way that the user can experience the user's present location. For example, a user can experience a walk in
Petition 870190049772, of 05/28/2019, p. 40/180
12/22
Berlin while using a 360 degree video system that is located in San Francisco.
[0063] A 360 degree video system may include a video capture device and a video display device and possibly also other intermediate devices, such as servers, data storage and data transmission equipment. A video capture device can include a camera set, which can include a set of multiple cameras, each oriented in a different direction and which captures a different view. In an illustrative example, six cameras can be used to capture a full 360 degree view centered on the location of the camera set. Some video capture devices may use fewer cameras. For example, some video capture devices primarily capture side-by-side views or use lenses with a wide field of view. In an illustrative example, one or more cameras equipped with two fisheye lenses, positioned back to back, can be used to capture two images that together provide a 360 degree field of view. A video generally includes pictures or figures, with a picture or figure being an electronically encoded static image of a scene. Cameras capture a certain number of frames per second, which is commonly referred to as the camera's frame rate.
[0064] In some cases, in order to provide a fully integrated 360-degree view, the composition of
Petition 870190049772, of 05/28/2019, p. 41/180
23/126 image can be made in the video frames (or images) captured by each of the cameras in the camera set. Image composition in the case of 360-degree video generation involves combining or joining video frames from adjacent cameras (or lenses) in the area where the video frames overlap or otherwise connect. The result is approximately a spherical picture, and similar to a Merchant projection, the joined data can be represented in a flat way. For example, the pixels in a joined video frame can be mapped on the planes of a cube format or some other three-dimensional flat format (for example, a pyramid, an octahedron, a decahedron etc.). Video capture or video display devices can operate on a checkered principle - which means that a video frame is treated as a grid of pixels in which case square planes, rectangular planes or other planes of suitable format can be used to represent a spherical environment.
[0065] Video frames in 360 degrees,
mapped to a representation board, can to be coded and / or tablets for storage and / or streaming. THE coding and / or compression can to be
performed using a video codec (for example, code that conforms to the high efficiency video encoding standard (HEVC), which is also known as H.265, the advanced video encoding standard, which
It is known as H.264, or another appropriate codec) and the results in a flow of bits of compressed video (or bit stream video encoded) or flow group in
Petition 870190049772, of 05/28/2019, p. 42/180
24/126 bits. The encoding of video data using a video codec is described in more detail below.
[0066]
In some deployments, the encoded bit stream (or bit streams) of video can be stored and / or encapsulated in a media format or file format. The stored bit stream (or bit streams) can be transmitted, for example, over a network, a receiving device that can decode and render the video for display. Such a receiving device may be referred to herein as a video display device. For example, a 360-degree video system can generate encapsulated files of encoded video data (for example, using an International Standard Organization (ISO) media base file format and / or derived file formats ). For example, the video codec can encode video data and an encapsulation mechanism can generate media files by encapsulating video data in one or more media files in ISO format. Alternatively or in addition, the stored bit stream (or bit streams) can be supplied directly from a storage medium to a receiving device.
[0067]
A receiving device can also deploy a codec to decode and / or decompress an encoded video bit stream. In cases where the encoded video bit stream (or bit stream) is stored and / or encapsulated in a media format or file format, the receiving device can support the media or file format that was used to package the
Petition 870190049772, of 05/28/2019, p. 43/180
25/126 video bit stream in a file (or files) and can extract video data (and possibly also audio) to generate encoded video data. For example, the receiving device can analyze the media files with the encapsulated video data to generate the encoded video data, and the codec on the receiving device can decode the encoded video data.
[0068] The receiving device can then send the decoded video signal to a rendering device (for example, a video display device, player device or other suitable rendering device). Rendering devices include, for example, helmet monitors, reality televisions and other devices with a 180 or 360 degree display. In general, the helmet monitor can monitor the movement of a user's head and / or the movement of the user's eye. The helmet monitor can use the tracking information to make the part of a 360-degree video that corresponds to the direction in which the user is looking, so that the user experiences the virtual environment the way it should in the real world. A rendering device can render a video at the same frame rate at which the video was captured or at different frame rates.
[0069] The video figures of the 360-degree video content can be encoded as a single layer bit stream using time interpretation (TIP), and the entire encoded bit stream can be stored on a server. In some cases, the figurations
Petition 870190049772, of 05/28/2019, p. 44/180
26/126 360-degree video content can be encoded as a multilayered bit stream using TIP and interlayer prediction (ILP). If necessary, the bit stream can be transmitted next to the receiver, completely decoded by the decoder, and to the region of the decoded picture corresponding to a portion of a scene that is viewed by the user (for example, determined based on the movement of the head and / or user's eyes) can be rendered to the user.
[0070] Figure 1 is a block diagram illustrating an example of a video encoding system 100 that includes an encoding device 104 and a decoding device 112. The encoding device 104 may be part of a source device , and the decoding device 112 may be part of a receiving device. The source device and / or the receiving device may include an electronic device, such as a mobile or stationary headset (for example, smart phone, cell phone or the like), a desktop computer, a laptop computer or notebook, a tablet-type computer, a set-top box, a television, a camera, a display device, a digital media player, an electronic game console, a video streaming device, a protocol camera Internet (IP) or any suitable electronic device. In some examples, the source device and the receiving device may include one or more wireless transceivers for wireless communications. Coding techniques
Petition 870190049772, of 05/28/2019, p. 45/180
27/126 described in this document are applicable to video encoding in various multimedia applications, including streaming video (for example, over the Internet), television broadcasts or broadcasts, digital video encoding for storage on a storage medium. decoding digital video stored on data storage media or other applications. In some instances, system 100 may support unidirectional or bidirectional video transmission to support applications such as video conferencing, video streaming, video playback, video broadcasting, video games and / or video telephony.
[0071] The encoding device 104 (or encoder) can be used to encode video data using a video encoding standard or protocol to generate an encoded video bit stream. Examples of video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG -4 Visual, ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC), including the Scalable Video Encoding (SVC) and Multiple View Video (MVC) encoding extensions and Video Encoding High Efficiency (HEVC) or ITU-T H.265. Several extensions to HEVC handle multi-layer video encoding, including scope and screen content encoding extensions, 3D video encoding (3D-HEVC) and multi-view extensions (MVHEVC) and scalable extension (SHVC) . HEVC and its
Petition 870190049772, of 05/28/2019, p. 46/180
28/126 extensions were developed by the Joint Collaboration Team on Video Coding (JCT-VC), as well as the Joint Collaboration Team on Video Coding Extension Development (JCT-3V) of the ITU Video Coding Specialists Group -T (VCEG) and Groups of Experts in Figuration in Motion (MPEG). MPEG and ITU-T VCEG have also formed a joint exploration video (JVET) team to explore new encoding tools for the next generation of the video encoding standard. The reference software is called JEM (joint exploration model).
[0072] Many modalities described in this document provide examples using JEM, the HEVC standard and / or their extensions. However, the techniques and systems described in this document may also apply to other coding standards, such as extensions to AVC, MPEG, the same, or other suitable coding standards already available or not yet available or developed. Consequently, although the techniques and systems described in this document may be described with reference to a particular video encoding standard, a person of ordinary skill in the art will note that the description should not be interpreted to apply only to that particular standard.
[0073] Referring to Figure 1, a video source 102 can provide the video data to the encoding device 104. The video source 102 can be part of the source device or it can be part of a device
Petition 870190049772, of 05/28/2019, p. 47/180
29/126 different from the original device. Video source 102 can include a video capture device (for example, a video camera, a camera phone, a video phone or the like), a video file that contains stored video, a video server or provider content that provides video data, a video feed interface that receives video from a video server or content provider, a computer graphics system to generate computer graphics video data, a combination of such sources or any another suitable video source.
[0074] The video data of the video source 102 may include one or more input figures or frames. A picture or frame in a video is a still image of a scene. The encoding mechanism 106 (or encoder) of the encoding device 104 encodes the video data to generate an encoded video bit stream. In some examples, an encoded video bit stream (or video bit stream or bit stream) is a series of one or more sequences of encoded videos. The encoded video sequence (CVS) includes a series of access units (AUs) that starts with an AU that has a random access point figuration on the base layer and with certain properties up to and that does not include a next AU that has a figure of random access point in the base layer and with certain properties. For example, certain properties of a random access point figure that initiates a CVS may include a RASL flag (for example, NoRaslOutputFlag) equal to
Petition 870190049772, of 05/28/2019, p. 48/180
12/30
1. Otherwise, a random access point figure (with RASL flag equal to 0) does not start a CVS. An access unit (AU) includes one or more coded figures and control information corresponding to the coded figures that share the same exit time. The coded slices of the figures are encapsulated at the bitstream level in data units called the network abstraction layer unit (NAL). For example, a HEVC video bit stream can include one or more CVSs including NAL units. Each NAL unit has an NAL unit header. In one example, the header is one byte for H.264 / AVC (except for multi-layered extensions) and two bytes for HEVC. The syntax elements in the NAL unit header assume the designated bits and are therefore visible to all types of systems and transport layers, such as Transport Flow, Real Time Transport Protocol (RTP), file format among others.
[0075] Two classes of NAL units exist in the HEVC standard, including video encoding layer (VCL) NAL units and non-VCL NAL units. The NAL VCL unit includes a slice or slice segment (described below) of encoded figuration data, and a non-VCL NAL unit includes control information that refers to one or more encoded figures. In some cases, an NAL unit may be called a package. A HEVC AU includes VCL NAL units that contain encoded figuration data and NAL units of
Petition 870190049772, of 05/28/2019, p. 49/180
12/31 non-VCL (if any) corresponding to the encoded figuration data.
[0076] NAL units may contain a sequence of bits that form an encoded representation of the video data (for example, an encoded video bit stream, a bit stream CVS or the like), such as encoded representations of figures in a video. The encoding mechanism 106 generates coded representations of figures by dividing each figure into multiple slices. A slice is independent of other slices so that the information on the slice is encoded without relying on data from other slices within the same picture. The slice includes one or more slice segments including an independent slice segment and, if present, one or more dependent slice segments that depend on previous slice segments. Then, the slices are divided into coding tree blocks (CTBs) of samples of luma samples and samples and chroma. A luma sample CTB and one or more chroma sample CTBs, along with sample syntax, are called a coding tree unit (CTU). A CTU is the basic processing unit for HEVC coding. A CTU can be divided into multiple coding units (CUs) of varying size. A CU contains luma and chroma sample arrangements that are called coding blocks (CBs).
[0077] The chroma luma CBs can be further divided into prediction blocks (PBs). A PB is a block of samples from the luma component or a
Petition 870190049772, of 05/28/2019, p. 50/180
32/126 chroma component that uses the same movement parameters to predict the interpretation or intrablock copy (when available or enabled for use). The luma PB and the one or more chroma PBs, together with the associated syntax, form a prediction unit (PU). For interpretation, a set of motion parameters (for example, one or more motion vectors, reference indexes or the like) is signaled in the bit stream for each PU and is used for interpreting the luma PB and the one or more PBs chroma key. The movement parameters can also be called movement information. A CB can also be divided into one or more transform blocks (TBs). The TB represents a square block of samples of a color component in which the same two-dimensional transform is applied to encode a residual prediction signal. A transform unit (TU) represents the TBs of luma and chroma samples and corresponding syntax elements.
[0078] A CU size corresponds to an encoding mode size and can be square in shape. For example, a CU size can be 8x8 samples, 16 x 16 samples, 32 x 32 samples, 64 x 64 samples or any other appropriate size up to the corresponding CTU size. The phrase N x N is used in this document to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions (for example, 8 pixels x 8 pixels). The pixels in a block can be arranged in rows and columns. In some modalities, blocks cannot have the number of pixels in
Petition 870190049772, of 05/28/2019, p. 51/180
33/126 a horizontal direction equal to the number in a vertical direction. The syntax data associated with a CU can describe, for example, the partition of the CU into one or more PUs. The partition modes can be different between the possibility of CU being coded by intraprediction mode or coded by interpretation mode. PUs can be partitioned to have a non-square shape. The syntax data associated with a CU can also describe, for example, the partition of the CU into one or more TUs according to the CTU. A TU can be square or non-square.
[0079] According to the FIEVC standard, transformations can be performed using transform units (TUs). TUs can vary for different CUs. TUs can be sized based on the size of the PUs within a given CU. TUs can be the same size or smaller than PUs. In some examples, residual samples corresponding to a CU can be subdivided into smaller units using a quadratic tree structure known as a residual quadratic tree (RQT). Leaf nodes of the RQT can correspond to the TUs. The pixel difference values associated with the TUs can be transformed to produce transform coefficients. The transform coefficients can then be quantified by the encoding mechanism 106.
[0080] Since the video data figures are partitioned into CUs, the encoding mechanism 106 predicts each PU using a prediction mode. Then, the prediction unit or the prediction block is
Petition 870190049772, of 05/28/2019, p. 52/180
34/126 subtracted from the original video data to obtain residuals (described below). For each CU, a prediction mode can be signaled within the bit stream using syntax data. A prediction mode can include intraprediction (or intrafiguration prediction) or interpretation (or interfiguration prediction). Intraprediction uses the correlation between spatially neighboring samples within a figuration. For example, using intraprediction, each PU is predicted from neighboring image data in the same picture using, for example, DC prediction to find a mean value for the PU, flat prediction to fit a flat surface to the PU, direction prediction to extrapolate from neighboring data or any other suitable types of prediction. Interpretation uses the temporal correlation between pictures in order to derive a motion-compensated prediction for a block of image samples. For example, with the use of interpretation, each PU is predicted with the use of motion compensation prediction of the image data in one or more reference figures (before or after the current figure in order of output). A decision can be made as to the possibility of coding a figuration area with the use of inter-configuration or intra-configuration prediction, for example, at the CU level.
[0081] In some examples, a type of slice is assigned to one or more slices of a figuration are assigned. The slice types include a slice I, a slice P and a slice B. A slice I (intra-frames, independently decodable) is a slice of a
Petition 870190049772, of 05/28/2019, p. 53/180
35/126 figuration that is encoded only by intraprediction and therefore is independently decodable since slice I requires only the data within the frame to predict any prediction unit or slice prediction block. A P slice (unidirectional predicted frames) is a slice of a picture that can be coded with intraprediction and unidirectional interpretation. Each prediction unit or prediction block within a P slice is either coded with intraprediction or interpredition. When interpretation is applied, the prediction unit or prediction block is predicted only by a reference figure and, therefore, the reference samples are only a reference region of a frame. A B slice (bidirectional predictive frames) is a slice of a picture that can be coded with intraprediction and with interpredition (for example, either biprediction or uniprediction). A prediction unit or prediction block of a slice B can be predicted bidirectionally from two reference figures, with each figure contributing a reference region, and the sample sets of the two reference regions are weighted (by equal weights or different weights) to produce the prediction signal of the bidirectional predicted block. As explained above, the slices in a picture are coded independently. In some cases, a figuration can be coded only as an affection.
[0082] A PU can include data (for example, movement parameters or other data
Petition 870190049772, of 05/28/2019, p. 54/180
36/126) in relation to the prediction process. For example, when the PU is encoded using intraprediction, the PU may include data that describes an intraprediction mode for the PU. As another example, when the PU is coded using interpretation, the PU can include data that defines a movement vector for the PU. The data that defines the motion vector for a PU can describe, for example, a horizontal component of the motion vector (Ax), a vertical component of the motion vector (Ay), a resolution for the motion vector (for example, precision integer, quarter pixel precision or quarter pixel precision), a reference figure to which the motion vector points, a reference index, a list of reference figures (for example, List 0 , List 1 or List C) for the motion vector or any combination thereof.
[0083] The coding device 104 can then perform transformation and quantization. For example, after the prediction, the encoding mechanism 106 can calculate residual values corresponding to the PU. Residual values can comprise pixel difference values between the current block of pixels that are encoded (the PU) and the prediction block used to predict the current block (for example, the predicted version of the current block). For example, after generating a prediction block (for example, which emits interpretation or intraprediction), the encoding mechanism 106 can generate a residual block by subtracting the prediction block produced by a prediction unit from the current block. The residual block includes a set of
Petition 870190049772, of 05/28/2019, p. 55/180
37/126 pixel difference values that quantify the differences between pixel values of the current block and pixel values of the prediction block. In some examples, the residual block can be represented in a two-dimensional block format (for example, a two-dimensional array or array of pixel values). In such examples, the residual locus is a two-dimensional representation of the pixel values.
[0084] Any residual data that may be left over after the prediction is made are transformed using a block transform, which can be based on a distinct cosine transform, a distinct sine transform, an integer transform, a transform of wavelet, another suitable transform function or any combination thereof. In some cases, one or more block transform (for example, 32 x 32, 16 x 16, 8 x 8, 4 x 4 or other suitable sizes) can be applied to residual data in each CU. In some embodiments, a TU can be used for the transform and quantization processes implemented by the encoding mechanism 106. A given CU that has one or more PUs can also include one or more TUs. As described in more detail below, the residual values can be transformed into transform coefficients using the block transform and then can be quantified and scanned using the TUs to produce serialized transform coefficients. for entropy coding.
[0085] In some modalities that follow intrapredictive or interpretive coding with the use of
Petition 870190049772, of 05/28/2019, p. 56/180
12/38
PUs of a CU, the encoding mechanism 106 can calculate residual data for the CU's TUs. PUs can comprise pixel data in the spatial domain (or pixel domain). TUs can comprise coefficients in the transform domain following the application of a block transform. As previously verified, the residual data can correspond to pixel difference values between pixels of the uncoded figure and prediction values corresponding to the PUs. The encoding mechanism 106 can form the TUs including the residual data for the CU and can then transform the TUs to produce the transform coefficients for the CU.
[0086] The encoding mechanism 106 can perform the quantization of the transform coefficients. Quantification additionally provides additional compression by quantifying the transform coefficients to reduce the amount of data used to represent the coefficients. For example, quantization can reduce the bit depth associated with some or all of the coefficients. In one example, a coefficient with a value of n bits can be rounded down to a value of m bit during quantization, where n is greater than m.
[0087] Once quantization is performed, the encoded video bitstream includes quantized transform coefficients, prediction information (eg, prediction modes, motion vectors, block vectors or the like), partitioning information and any other suitable data, such as other syntax data. Then the elements
Petition 870190049772, of 05/28/2019, p. 57/180
39/126 different from the encoded video bit stream can be entropy encoded by the encoding mechanism 106. In some examples, the encoding mechanism 106 may use a predefined scan order to scan the quantized transform coefficients to produce a vector serialized that can be encoded by entropy. In some instances, the encoding mechanism 106 may perform an adaptive scan. After scanning the quantized transform coefficients to form a vector (for example, a two-dimensional vector), the encoding mechanism 106 can entropy the vector. For example, encoder mechanism 106 may use context-adaptive variable-length encoding, context-adaptive binary arithmetic, syntax-based context-adaptive binary coding, probability interval partitioning entropy encoding, or other encoding technique by adequate entropy.
[0088] As described earlier, a HEVC bit stream includes a group of NAL units, including VCL NAL units and non-VCL NAL units. The VCL NAL units include encoded picture data that forms an encoded video bit stream. For example, a bit stream that forms the encoded video bit stream is present in the VCL NAL units. Non-VCL NAL units can contain sets of parameters with high level information regarding the bit rate of encoded video, in addition to other information. For example, a set of
Petition 870190049772, of 05/28/2019, p. 58/180
40/126 parameters can include a set of video parameters (VPS), a set of sequence parameters (SPS) and a set of figuration parameters (PPS). Examples of goals from the parameter sets include bit rate efficiency, error resilience, and providing system layer interfaces. Each slice references a single active PPS, SPS and VPS to access information that the decoding device 112 can use to decode the slice. An identifier (ID) can be encoded for each set of parameters that includes a VPS ID, an SPS ID and a PPS ID. An SPS includes an SPS ID and a VPS ID. A PPS includes a PPS ID and an SPS ID. Each slice header includes a PPS ID. Using IDs, sets of active parameters can be identified for a given slice.
[0089] A PPS includes information that applies to all slices in a given figuration. Because of this, all slices in a picture refer to the same PPS. Slices in different shapes can also refer to the same PPS. An SPS includes information that applies to all pictures in the same encoded video sequence (CVS) or bit stream. As previously described, a sequence of encoded videos is a series of access units (AUs) that begin with the random access point (for example, an instant decode reference (IDR) figure or link access figure) (BLA) or other suitable random access point figuration) in the base layer and with certain properties (described above)
Petition 870190049772, of 05/28/2019, p. 59/180
41/126 up to and not including and a next AU that has a random access point figuration in the base layer and with certain properties (or the end of the bit stream). The information in an SPS may not change from figuration to figuration within a sequence of encoded videos. Figures in a sequence of encoded videos can use the same SPS. The VPS includes information that applies to all layers within a sequence or bit stream of encoded video bits. The VPS includes a syntax structure with syntax elements that apply to entire encoded video sequences. In some modalities, VPS, SPS or PPS can be transmitted in band with the encoded bit stream. In some embodiments, the VPS, SPS or PPS can be transmitted out of band in a separate transmission and not in the NAL units that contain encoded video data.
[0090] A video bit stream can also include complementary enhancement information (SEI) messages. For example, a SEI NAL unit may be part of the video bit stream. In some cases, an SEI message may contain information that is not required by the decryption process. For example, the information in an SEI message may not be essential for the decoder to decode the bit stream video pictures, but the decoder can use the information to improve the display or processing of the pictures (for example, broadcasting decoded). The information in an SEI message can be integrated metadata. In an illustrative example, the information in
Petition 870190049772, of 05/28/2019, p. 60/180
42/126 an SEI message can be used by entities on the decoded side to enhance the ability to view the content. In some instances, certain application standards may require the presence of such SEI messages in the bit stream so that quality improvement can be done on all devices that conform to the application standard (for example, transport of the frame packaging SEI message to the frame compatible stereoscopic 3DTV video format, in which the SEI message is carried to each frame of the video, handling a recovery point SEI message, using a SEI message rectangular pan scan and DVB scan, plus many other examples).
[0091] The output 110 of the encoding device 104 can send the NAL units, which form the encoded video bitstream data, through the communication link 120 to the decoding device 112 of the receiving device. The input 114 of the decoding device 112 can receive the NAL units. The communication link 120 may include a channel provided by a wireless network, a wired network, or a combination of a wired and wireless network. A wireless network can include any wireless interface or combination of wireless interfaces and can include any suitable wireless network (for example, the Internet or another wide area network, a packet-based network, WiFi ™, radio frequency (RE ), UWB, WiFi-Direct, cellular, Long Term Evolution (LTE), WiMax ™ or similar). A wired network can include
Petition 870190049772, of 05/28/2019, p. 61/180
43/126 any wired interface (for example, fiber, ethernet, power line ethernet, coaxial cable ethernet, digital signal line (DSL) or similar). Wired and / or wireless networks can be deployed using various equipment, such as base stations, routers, access points, bridges, communication ports, switches or the like. The bitstream data of encoded video can be modulated according to a communication standard, such as a wireless communication protocol and transmitted to the receiving device.
[0092] In some examples, the encoding device 104 can store encoded video bit stream data in storage 108. Output 110 can retrieve encoded video bit stream data from encoder mechanism 106 or storage 108. The storage 108 can include any one of a variety of locally distributed distributed data storage media. For example, storage 108 may include a hard disk, a storage disk, flash memory, volatile or non-volatile memory or any other digital storage media for storing encoded video data.
[0093] Input 114 of the decoding device 112 receives the encoded video bit stream data and can provide the video bit stream data to the decoder mechanism 116 or to storage 118 for later use by the decoder mechanism 116. The decoder mechanism 116 can decode the bitstream data of encoded video by decoding by
Petition 870190049772, of 05/28/2019, p. 62/180
44/126 entropy (for example, using an entropy decoder) and by extracting the elements from one or more encoded video sequences that form the encoded video data. Then, the decoder mechanism 116 can then scale again and perform a reverse transform on the encoded video bitstream data. The residual data is then passed to a prediction stage of the decoder mechanism 116. Then, the decoder mechanism 116 predicts a block of pixels (e.g., a PU). In some examples, the prediction is added to the output of the inverse transform (the residual data).
[0094] The decoding device 112 may output the decoded video to a video destination device 122, which may include a display or other broadcast device to display the decoded video data to a content consumer. In some aspects, the video destination device 122 may be part of the receiving device that includes the decoding device 112. In some aspects, the video destination device 122 may be part of a separate device other than the receiving device.
[0095] In some embodiments, the video encoding device 104 and / or the video decoding device 112 can be integrated with an audio encoding device and audio decoding device, respectively. The video encoding device 104 and / or the video decoding device 112 also
Petition 870190049772, of 05/28/2019, p. 63/180
45/126 may include other hardware or software that is required to implement the coding techniques described above, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), programmable port arrangements in field (FPGAs), distinct logic, software, hardware, firmware or any combination thereof. The video coding device 104 and the video decoding device 112 can be integrated as part of a combined encoder / decoder (codec) in a respective device. An example of specific details of encoding device 104 is described below with reference to Figure 14. An example of specific details of decoding device 112 is described below with reference to Figure 15.
[0096] The extensions for the HEVC standard include the Multiple View Video Encoding Extension, called the MV-HEVC, and a Scalable Video Encoding Extension, called the SHVC. The MV-HEVC and SHVC extensions share the concept of layered encoding, with different layers being included in the encoded video bitstream. Each layer in a sequence of encoded videos is addressed by a unique layer identifier (ID). A layer ID can be present in a header of an NAL unit to identify a layer with which the NAL unit is associated. In MV-HEVC, different layers can represent different views of the same scene in the video bit stream. In SHVC, different scalable layers are provided that
Petition 870190049772, of 05/28/2019, p. 64/180
46/126 represent the video bit stream in different spatial resolutions (or figuration resolution) or in different reconstruction fidelities. Scalable layers can include a base layer (with layer ID = 0) and one or more intensification layers (with layer IDs = 1, 2, ... n). The base layer can conform to a profile of the first version of HEVC and represents the largest available lower layer in a bit stream. The intensification layers have increased spatial resolution, temporal resolution or frame rate and / or reconstruction fidelity (or quality) compared to the base layer. The intensification layers are organized hierarchically and may (or may not) depend on lower layers. In some examples, different layers can be encoded using a single standard codec (for example, all layers are encoded using HEVC, SHVC or another encoding standard). In some examples, different layers can be encoded using a multi-pattern codec. For example, a base layer can be coded using an AVC, whereas one or more intensification layers can be coded using SHVC and / or MV-HEVC extensions for the HEVC standard.
[0097] In general, a layer includes a set of VCL NAL units and a corresponding set of non-VCL NAL units. NAL units are assigned a particular layer ID value. Layers can be hierarchical in the sense that a layer can depend on a lower layer. A set of
Petition 870190049772, of 05/28/2019, p. 65/180
47/126 layers refers to a set of layers represented within a stream of bits that are self-contained, meaning that layers within a set of layers may depend on other layers in the set of layers in the decoding process, however it does not depend on any other layers for decoding. Consequently, the layers in a set of layers can form an independent bit stream that can represent video content. The set of layers in a set of layers can be obtained from another bit stream by operating a bit stream extraction process. A set of layers can correspond to the set of layers that must be decoded when a decoder wants to operate according to certain parameters.
[0098] In some deployments, camera sets for capturing 360-degree video may include an omnidirectional camera, catadioptric camera (a camera that uses curved lenses and mirrors), a camera equipped with a fisheye lens and / or another suitable camera. An example of an omnidirectional camera is the Ricoh Theta-S, which uses two fisheye lenses that focus in opposite directions.
[0099] Omnidirectional cameras, such as catadioptric camera and camera with fisheye lenses, typically capture images with a significant amount of distortion. Figure 2A and Figure 2B illustrate examples of video frames captured by omnidirectional cameras that use a fisheye lens to capture a wide field of view. In the example of Figure 2A, video frame 200 includes a fisheye image
Petition 870190049772, of 05/28/2019, p. 66/180
48/126 circular. Fisheye lenses are capable of capturing very wide angles, such as 280 degrees or greater. Therefore, a camera equipped with two fisheye lenses, positioned back to back, can capture two images that together provide 360 degrees of view (or more). Non-wide-angle fisheye lenses capture a field of view in the range of about 45 to about 90 degrees. A field of view can be, alternatively or additionally, expressed in radians.
[00100] In order to capture a wide angle, fisheye lenses distort the image of a scene. As illustrated in Figure 2A, the scene captured in video frame 200 is circular in shape, and is deformed from the center to the outer edges of this circular region. Due to the fact that the camera sensors are rectangular, the video frame 200 is rectangular and the image includes areas, illustrated dotted in the present context, that are not parts of the scene. The pixels in these regions are considered unusable, since these pixels are not part of the scene.
[00101] The example of Figure 2B includes a video frame 202 that includes a full-frame image of the fish eye. In this type of video frame 202, a wide-angle field of view was also captured in a circular region, with the scene being deformed in the circular region. In this example, the image has been scaled (for example, it has become wider) so that the scene fills the edges of the rectangular frame. This example video frame 202 does not include unusable areas, and
Petition 870190049772, of 05/28/2019, p. 67/180
49/126 some parts of the scene that can be captured by the lens have been cropped and not captured.
[00102] As described above, other types of cameras can also be used to capture video in 360 degrees. For example, a set of cameras can include a set of multiple cameras (for example, 5, 6, 7 or another camera number needed to capture a sufficient number of views of a scene). Each camera can be oriented in a different direction and capture a different view of a scene. The image composition can then be performed on the video frames (or images) captured by each of the cameras in the camera set in order to provide a fully integrated 360 degree view.
[00103] The video in 360 degrees can be remapped to other formats. These other formats can be used to store, transmit and / or view the video in 360 degrees. An exemplary shape is an equirectangular shape. Figure 3 illustrates an example an equirectangular video frame 300 based on two fisheye images 302A, 302B. In this exemplary equirectangular video frame 300, the usable pixels of the two fisheye images 302A, 302B (for example, pixels in the circular regions) were mapped in an equirectangular format. In this example, each fish eye image 302A, 302B includes a 180 degree or greater field of view, so that together the two fish eye images 302A, 302B together cover a 360 degree field of view (possibly with some overlap ).
[00104] The image mapping pixels
Petition 870190049772, of 05/28/2019, p. 68/180
50/126 fisheye 302A, 302B have the effect of eliminating the deformation of the scene captured in the fisheye 302A, 302B images, and stretching the pixels towards the edges of the video frame 300. The equirectangular image may appear stretched at the top and at the bottom of the video frame 300. A well-known equirectangular projection is a Merchant projection, in which the geography of the Earth is presented with orthogonal lines of latitude and longitude.
[00105] In several deployments, fisheye images 302A, 302B can be mapped to other formats, such as on the faces formed by a cube, a cylinder, a pyramid, a truncated pyramid or some other type of geometric shape. In each of these cases, the distortion present in fisheye 302A, 302B images can be corrected, and unusable pixels can be eliminated. Flat data can be packaged for storage and / or transmission and can be used to display the video in 360 degrees.
[00106] In some cases, an intermediate format can be useful, for example, to store and / or transmit video data in 360 degrees or to convert video data into another format. For example, an equirectangular representation can be mapped to a spherical shape (for example, a spherical geometry) to display the video data, as shown in Figures 4A and 4B.
[00107] Figures 4A and 4B illustrate an example of an equirectangular video frame 400 that is used in a 360-degree video presentation. The video frame
Petition 870190049772, of 05/28/2019, p. 69/180
51/126 equirectangular 400 can be mapped in a spherical space to form a spherical representation 410 and a resulting spherical representation can be displayed to a viewer 420 using a helmet monitor or some other 360 degree video display device. In other examples, the equirectangular video frame 400 can be mapped to a cubic, cylindrical, pyramidal shape or some other geometric shape, and the geometric shape can be used by the 360 degree video display device to display the video.
[00108] As noted above, an equirectangular video frame 400 can capture a full 360 degree field of view, with pixels in the upper and lower regions that appear stretched and / or compressed. In order to use the equirectangular video frame 400 in a 360-degree video presentation, the pixels in the equirectangular video frame 400 can be mapped to spherical representation 410. This mapping can have the effect of expanding the upper and lower regions of the frame of equirectangular video 400 towards the top and bottom (for example, the north pole and south pole, respectively) of the spherical representation. The expansion of the upper and lower regions can correct distortion in these areas that is apparent in the 400 equirectangular video frame.
[00109] The mapping of the equirectangular video frame 400 to spherical representation 410 may additionally have the effect of wrapping the width of the frame around the center (e.g., the equator) of the spherical representation. The left and right edges of the
Petition 870190049772, of 05/28/2019, p. 70/180
52/126 equirectangular video frame 400 can be mapped side by side, so that no splice appears.
[00110] Once the equirectangular video frame 400 has been mapped to a spherical representation, the spherical representation can be displayed. A viewer 420, who uses a helmet monitor or other 360-degree video display device, can view the spherical representation from within the spherical representation. In most cases, the viewer 420 is positioned so that the floor, from the perspective of the expected, is the lowest point of the spherical representation. In some cases, it can be assumed that the user's eyes are in the center of the sphere. In various deployments, the spherical representation can be expanded or contracted to suit the height and / or position of the viewer (for example, if the viewer is sitting, standing or in some other position).
[00111] As described above, one or more systems and methods described in this document are intended to generate media files for 360-degree video content to include signaling information from one or more regions of interest (ROIs) in the video content . One or more systems and methods described in this document are also intended to process the signaling information included in the media files to extract the ROI from the video content for rendering.
[00112] As noted above, an ROI of a 360-degree video figuration can be a predetermined region of the figuration that captures a particular
Petition 870190049772, of 05/28/2019, p. 71/180
53/126 portion of the scene. In such cases, an ROI can also be called the region of greatest interest. The generation and signaling of information related to ROIs can be performed using the input provided by the user, based on user statistics by a service or content provider or using other appropriate techniques. In many instances, a ROI determined for a picture may include a portion chosen specifically from a 360-degree video content item that directs the audience's view, a statistically determined region of interest, or another predetermined portion of a scene. For example, a content creator (for example, a director, a producer, an author or the like) can define the regions of greatest interest in a 360 degree video content item. In such an example, 360-degree video playback may display the dynamic change viewing window that a director or other party wants the audience to pay attention to, even when the user is not turning their head or changing the viewing window. through another user interface (UI). Such viewing windows can be equipped with an omnidirectional scene-by-scene video. In another example, the ROIs in various pictures of a 360 video content item can be determined using statistics from which regions were requested and / or viewed by most users when a given 360 degree video content (or VR) was provided through streaming service. In such an example, an ROI in a 360-degree video figuration can include one of the
Petition 870190049772, of 05/28/2019, p. 72/180
54/126 regions that are statistically more likely to be rendered to the user at the time of presentation of the figuration.
[00113] Information in ROIs can be used for various purposes of improving video performance in 360 degrees. For example, ROI information can be used to obtain data on adaptive 360-degree video streaming by servers, clients and / or other edge entities. In another example, ROI information can be used for transcoding optimization when a VR video is transcoded (for example, for a different codec, for a different projection mapping or other transcoding operation). In other examples, ROI information can be used for cache management by a server or Edge cache, content management by a 360 degree video streaming server, or other purposes. In some cases, ROI signaling can be performed, for example, using SEI messages in a video bit stream, in a sample group file format in a media file, in a dynamic adaptive streaming through HTTP (DASH), elements or attributes of media presentation description (MPD) (for example, using a sample group) and / or other signaling mechanisms.
[00114] An ROI on a 360-degree video can be defined in at least two ways. For example, an ROI on a 360-degree video should define an ROI based on the 2D Cartesian coordinate system in a 2D figuration.
Petition 870190049772, of 05/28/2019, p. 73/180
55/126
Another way of defining an ROI can be defined based on the spherical coordinate system (for example, by defining a region on the spherical surface of the video in 360 degrees).
[00115] Several methods can be used to define ROIs based on the spherical coordinate system. For example, an ROI can be defined as a region on a spherical surface that is confined by the four segments or by four large circles or confined by two large circles and two small circles, each segment between two points on the spherical surface. In this document, a circle, large circle and small circle are defined as follows (and are illustrated in Figure 5A and Figure 5B, described below): The intersection of the plane and a sphere is a circle (except when the intersection is a point). All points in this circle belong to the surface of the sphere. A large circle, also known as an orthodrome or Riemannian circle, of a sphere is the intersection of the sphere and a plane that passes through the central point of the sphere. The center of the sphere and the center of a large circle are always colocalized. Any other crossing of a plane with a sphere that does not meet this condition can form a small circle.
[00116] When a 360-degree video is played back on a helmet monitor (HMD) or a non-HMD display (for example, a TV, a mobile device, a wearable device on the body or another suitable non-HMD display) , a preview window is rendered
Petition 870190049772, of 05/28/2019, p. 74/180
56/126 for the user. A viewing window can be a rectangular region on a plane that is tangent to the sphere (intersects with the sphere at a point), where the viewing window plane is orthogonal to the user's viewing direction. A viewing window can be generated by applying straight projection (for example, as discussed in JVET-D1030). The region on the sphere that corresponds to a viewing window is one that is confined by the four segments of the four large circles.
[00117] Several problems are present in relation to the existing models for signaling ROIs in the VR video. For example, problems can arise from signaling based only or on the spherical coordinate system (by signaling a region on the sphere) or the Cartesian 2D coordinate system (by signaling a region of a figure). Additional processing may be required to render and / or transmit video data, which can affect the performance of the video processing system (for example, video cache, media gateway, renderer, etc.) and may cause a delay in the transmission and / or rendering of video content, which can cause a poor user experience. For example, an ROI signaling based on a spherical coordinate system is beneficial from the rendering point of view, due to the fact that for the rendering of a particular spherical region (for example, an object in the scene) that is of interest to the user, and if this spherical region
Petition 870190049772, of 05/28/2019, p. 75/180
57/126 is signaled, it can be easily identified and located across the spherical video pussy. However, when such spherical based signaling is used for delivery and decoding optimizations (for example, in pre-fetching data in the adaptive continuous transmission, such as DASH), then a local cache or accurate media communication port find out which set of independently coded figuration regions is the minimum set that covers the signaled ROI. To do this, the cache or media communication port needs to perform geometric processing that involves the projection and mapping towards the region that were used in converting the spherical video signal into a 2D video signal before encoding. This would be a huge processing burden for cache and media communications ports. On the other hand, an ROI signaling based on a 2D Cartesian coordinate system and beneficial from the point of view of delivery and decoding optimizations (for example, in the pre-acquisition of data in continuous adaptive transmission, such as DASH), at the same time. time imposes a weight for rendering once players or Competitors need to apply the inverse geometric processing of projection and mapping towards the region when it is necessary to find out which region in the sphere is covered by the independently coded figuration regions (which are flagged as the ROI).
[00118] Another problem is the fact that when an ROI based on the spherical coordinate system is signaled as a region on the sphere, in order to discover the
Petition 870190049772, of 05/28/2019, p. 76/180
58/126 dimension (or dimensions) (for example, width and height) of the viewing window that cooperates with the region, it may be necessary to apply a rectilinear projection. However, this information may be necessary during session negotiation or content selection, during the application of the straight line projection process to determine whether dimensions are a weight.
[00119] In some cases, problems may arise when the region on a spherical surface that is confined by the four segments of two large circles and two small circles does not correspond to the viewing window. For example, a viewport may correspond to a non-rectangular region in a projected 2D equirectangular design (for example, the entire viewport 520 shown in Figure 5A), while the region on a spherical surface that is confined by the four segments of two large circles and two small circles can correspond to only a subset of a viewing window region (for example, only the rectangular region within the non-rectangular region of the viewing window 520). In some cases, it is also possible for the rectangular region to include the non-rectangular region (the non-rectangular region is a subset of the rectangular region). However, the rectangular region and the non-rectangular region may never be exactly compatible with each other.
[00120] A media file generated for 360 degree video content using the systems and methods described in this document may include first signaling information and second information
Petition 870190049772, of 05/28/2019, p. 77/180
59/126 of ROI signaling. The first signaling information can include spherical information that defines a first ROI location and ROI dimension information in a corresponding three-dimensional spherical space for the spherical video. The second signaling information may include 2D information that defines a second ROI location in a two-dimensional space formed by projecting the spherical space onto a plane. In some instances, dual signaling can provide a mapping between the first location and the second location of the ROI, which can facilitate both the transmission and rendering of spherical video data. For example, spherical video data can be transmitted by a streaming application in the form of two-dimensional video frames. As described above, two-dimensional video frames can be formed by making a projection (for example, equirectangular projection or other suitable projection) of the spherical video data on a two-dimensional plane. In order to render an ROI based, for example, on a portion of the scene predetermined to be of interest (for example, an instruction to render a cut of the director, a statistically determined region or other suitable information), a region corresponding to the ROI can be identified in the spherical video based on the first location. In addition, by mapping between the first location and the second location, the streaming application can determine which regions of the two-dimensional video frames should be pre-obtained for
Petition 870190049772, of 05/28/2019, p. 78/180
60/126 ROI rendering. In addition, after obtaining the regions of the two-dimensional video frames, a media player or renderer can identify pixels from the regions corresponding to the ROI based, for example, on the ROI dimension information, and can render the extracted pixels.
[00121] Figure 4A is a diagram illustrating a region of interest (ROI) 430. ROI 430 can comprise a subset of the pixels included in an equirectangular video frame 400. As described above, ROI 430 can correspond, for example, to a region of predetermined interest (ROI) to be presented as a current field of view (FOV) of the viewer 420. The predetermined ROI may correspond, for example, to a cut of the director to guide the viewer 420 through a predetermined set of views of a scene, a statistically determined region of the frame or similar. In some examples, the ROI 430 may also correspond, for example, to a view direction of the viewer 420 with respect to the spherical representation 410, so that the viewer 420 can control a portion of the scene to view. The ROI 430 can then be mapped to form a viewing window to be rendered by the viewing device used by a viewer 420. A distinct 360-degree video feature compared to normal video (not 360 degrees or not) VR) is the fact that, in a 360 degree video, typically only a subset of the entire video region represented by the video pictures (corresponding to the
Petition 870190049772, of 05/28/2019, p. 79/180
61/126 field of view (FOV) or viewing window of the display device) is displayed, whereas in normal video applications, typically the entire video region is displayed. The FOV or viewing window is the area that is currently represented by the display device and that is viewed by the user or viewed.
[00122] Figure 4B is a diagram illustrating an example of a viewing window 460 corresponding to ROI 430. Viewing window 460 can be a region on a plane that is tangent to the spherical space that forms spherical representation 410. The window visualization 460 can be formed by making a straight projection of the ROI 430 on the plane. In the example of Figure 4B, the viewport plane can intersect with the spherical space of the spherical representation 410 at one point and can be orthogonal to the user's viewing direction 420.
[00123] Figure 40 is a diagram illustrating an example of representing a location of the viewing window 460 within the spherical space of the spherical representation 410. In the example of a Figure 40, the location of the viewing window 460 can be represented by a tilt angle 462 and a yaw angle 464. Both angles can be derived from a direction of view of the user based on the location of an ROI in the spherical scene. For example, a viewing direction of the user positioned at the spherical center 472 towards a center and viewing window 474 of the viewing window 460 can be represented by a vector 470. The vector 470 can form a projection 476 on the x-z plane and a
Petition 870190049772, of 05/28/2019, p. 80/180
62/126 478 projection on the x-y plane. The angle of inclination 462 can be formed between projection 476 and a geometric axis 480 which is parallel to the geometric axis y. The yaw angle 464 can be formed between the projection 478 and the axis 480.
[00124] Both the tilt angle 462 and the yaw angle 464 can be related to the location of the viewing window 460 with an orientation of the user's head and / or eyes. For example, the tilt angle 462 can represent a vector elevation angle 470 that can correspond, for example, to an elevation angle of the user's head with respect to the xz plane, a rotation of the user's eyes with respect to the xz plane or any other movement of the user with respect to the xz plane. In addition, the yaw angle 464 can represent a rotation angle of the vector 470, which can correspond, for example, to a rotation angle of the user's head, a rotation of the user's head with respect to the xy plane, a rotation of the user's eye with respect to the xy plane or any other movement of the user with respect to the xz plane. By representing the location of the viewing window 460 based on the angle of inclination 462 and the yaw angle 464, a location of region of interest (ROI) represented by the viewing window 460 can be efficiently determined based on the orientation of the user's head and / or eyes, which enables efficient rendering of the spherical video content portion corresponding to the ROI.
[00125] In addition to the center 474 of the
Petition 870190049772, of 05/28/2019, p. 81/180
63/126 view 460, other attributes of view window 460 can also be represented based on yaw angle 464 and tilt angle 462. For example, with reference to Figure 4E, the intermediate points 482, 484, 486 and 488 can be intermediate points between the edges of the viewing window 460. The distance between intermediate points 484 and 488 can define, for example, a height of the viewing window 460, while the distance between the intermediate points 482 and 486 can define, for example, a width of the viewing window 460. The height of the viewing window 460 can be represented by a delta tilt angle 490 formed by the spherical center 472, the intermediate point 484 and the intermediate point 488. In addition, referring to to Figure 4E, which illustrates a different perspective of the viewing window 460 of Figures 4C to 4D, the width of the viewing window 460 can also be er represented by a delta yaw angle 492 formed between the spherical center 472, midpoint 482 and the midpoint 486. The location, height and width of the viewing window 460 may represent a result of straight projections from a predetermined location, a predetermined height and a predetermined width of the ROI 430 on a plane corresponding to the viewing window 460.
[00126] Together with the tilt angle 462 and the yaw angle 464, the delta tilt angle 490 and the delta yaw angle 492 can define a viewport location and dimension 460 (and ROI) in the
Petition 870190049772, of 05/28/2019, p. 82/180
64/126 spherical space and based on an orientation of the user's head and / or eyes. To be discussed in more detail below, the location and dimension information of the viewing window 460 can be part of the first signaling information included in a media file. The media file can be, for example, an ISO-based media file that encapsulates a bit stream of a set of two-dimensional video frames generated for rendering / broadcasting the spherical video. The media file can also include a timed metadata strip used to continuously stream the bit stream. The media file can also include second signaling information for a particular region (or regions) of the two-dimensional video frames that include the ROI. The first signaling information and the second signaling information can be mapped together in the media file to signal the ROI. Based on the mapping, regions of the two-dimensional video frame that include the ROI can be pre-fetched and provided to the renderer. In addition, the renderer can extract pixels from the video frame regions that represent the ROI based on the dimension information in the 460 viewing window and render the pixels for display. As a result, additional processing (for example, performing the straight line projection or an inverse straight line projection) can be reduced, which can improve the performance of the video processing system as well as the user experience.
[00127] Although Figures 4A to 4E illustrate that
Petition 870190049772, of 05/28/2019, p. 83/180
65/126 the viewing window 460 has a rectangular shape, a viewing window can have other shapes. The format of a viewing window can be determined based on how a region corresponding to a viewing window (for example, ROI 430) is defined geometrically in spherical representation 410. Reference is now made to Figures 5A to 5C which illustrate different definitions geometries of ROI 430. In Figure 5A, region 501 can be defined by circles 502, 504, 506 and 508. Each of circles 502, 504, 506 and 508 can be called a large circle. Each of circles 502, 504, 506 and 508 can be formed by crossing the spherical space DE spherical representation 410 and a plane that passes through spherical center 472. In Figure 5B, region 509 can be defined by circles 502 and 504 and circles 516 and 518. As discussed above, circles 502 and 504 can be called large circles. In contrast, circles 516 and 518 are called small circles, which can be formed by crossing the spherical spherical space 410 and a plane that does not pass through the spherical center 472.
[00128] The geometric definition of ROI 430 (for example, whether defined by four large circles or by two large circles and two small circles) can determine the shape and dimension of the corresponding viewing windows. Reference is now made to Figure 5C which illustrates a comparison between the viewing window 520 and the rectangular region 530. As shown in Figure 5C, the rectangular region 530 is better and includes fewer pixels than the
Petition 870190049772, of 05/28/2019, p. 84/180
66/126 viewing window 520. A larger viewing window is preferred due to the fact that it corresponds to what can be viewed from an HMD or other displays and, for example, more pixels can be displayed to the user. In some deployments, in order to maximize the number of pixels provided to a user in a viewport, an ROI is flagged in a media file only if the region corresponding to the ROI is formed only by large circles. Such a constraint can also improve uniformity and predictability in rendering the viewport. For example, with reference to Figure 50, a renderer can render the viewport in the form of viewport 520 instead of the rectangular region 530 and interpret, for example, the height of the viewport (for example, represented by the angle of view). slope delta) as a representative of the height h between the upper and lower curved edges of the viewing window 520 instead of the height h 'between the upper and lower straight line edges of the rectangular region 530.
[00129] Figure 6A illustrates a set of two-dimensional video frames 602a, 602b to 602n. Each of the two-dimensional video frames 602a, 602b to 602n corresponds to a spherical representation video frame 410. Each two-dimensional video frame 602a, 602b to 602n can be formed by performing, for example, rectilinear projection of the corresponding video frame of the spherical representation 410 in a two-dimensional plane. Two-dimensional video frames 602a, 602b to 602n can be
Petition 870190049772, of 05/28/2019, p. 85/180
67/126 encoded in a video bit stream for transmission.
[00130] Each of the two-dimensional video frames 602a, 602b to 602n can be divided into a set of mosaics. The mosaics in the video frames 602a, 602b to 602n can be mosaics of limited movement and all the figures in one layer can have the same mosaic structure. In such cases, the tiles have the same location across all frames of a given bit stream layer. For example, a limited movement mosaic is a mosaic region at a particular location in a figuration (or picture) that can be coded only with the use of one or more mosaics at the same location in other figures. For example, only the region of a reference figuration that is within a particular mosaic location can be used to encode or decode a mosaic at that particular mosaic location in a current figuration. Only the mosaics of the figurations that are required to display a current viewing window from a display device can be provided display. As shown in Figure 6A, each mosaic has a location designated by all the different video frames 602a, 602b to 602n. In one example, a first mosaic has a location of (0, 0) at 602a, 602b to 602n, and the first mosaic can be identified based on the location. In some cases, mosaics may be numbered, such as mosaic numbers 0 to 23, mosaic numbers 1 to 24 or other suitable numbering. As shown in Figure 6, the mosaics do not overlap each other. Each of the frames
Petition 870190049772, of 05/28/2019, p. 86/180
68/126 two-dimensional video 602a, 602b to 602n can also include one or more ROIs (or viewports) projected from a corresponding spherical representation frame 410. For example, as shown in Figure 6B, viewport 520 may be located in a group of mosaics at locations (1, 1), (1, 2), (2, 1) and (2, 2).
[00131] As discussed above, a media file can be generated to encapsulate a bit stream formed by encoding the video frames 602a, 602b to 602n. A media file can also be generated to include a timed metadata track (in addition to the track (or tracks) that carries the media bit stream) used to transmit the bit stream. The media file can include the first signaling information and the second signaling information described above for an ROI (corresponding to a viewport) to facilitate the transmission and rendering of the ROI / viewport. The first signaling information may include a location and a dimension of the viewing window in the spherical space (for example, represented by the yaw angle, tilt angle, delta yaw angle and delta tilt angle). The second signaling information may include a location of the viewing window in the two-dimensional video frames. The location of the viewing window in two-dimensional video frames can be represented, for example, by the locations (or identifiers) of the mosaics that include the viewing window. For the example in Figure
Petition 870190049772, of 05/28/2019, p. 87/180
69/126
6B, the second signaling information may include the mosaic locations (1, 1), (1, 2), (2, 1) and (2, 2) (or the identifiers / numbers of those mosaics) to signal the ROI.
[00132] As discussed above, the first signaling information and the second signaling information can be mapped together in the media file to signal the ROI. The mapping enables efficient transmission and rendering of the viewing window to the user. For example, a video processing system may receive an instruction to render a predetermined region of interest in the spherical video 410 to the user. The instruction may include, for example, the yaw angle and the angle of inclination from the center of the specific region. Based on the input yaw angle and the angle of inclination in the first signaling information, the video processing system can refer to the mapping between the first signaling information and the second signaling information in the media file to determine the set of mosaic tiles (or other units) of pixels in the video frame 602a which contains, for example, viewing window 520. Furthermore, based on the angle of inclination, the angle of yaw, the angle of delta delta, the angle of delta yaw and a determination of a particular viewport format (for example, based on the restriction that the predetermined region in the spherical video 410 is defined based on four large circles), a renderer can also determine a location and a window boundary 520 display
Petition 870190049772, of 05/28/2019, p. 88/180
70/126 inside the mosaics, and extract pixels within the limit of the 520 viewing window for rendering. Such processing can be performed with minimal geographic processing, which can improve system performance as well as user experience.
[00133] Reference is now made to Figure 7 which illustrates an example of a media file based on ISO 700 that contains ROI signaling information. The 700 file can be formatted according to the ISOBMFF. ISOBMFF is designed to contain timed media information in a flexible and extensible format that facilitates media interchange, management, editing and presentation. The media presentation can be local to the system containing the presentation, or the presentation can take place through a network or other flow delivery mechanisms.
[00134] A presentation, as defined by the ISOBMFF specification, is a sequence of pictures, often reported as captured sequentially by a video capture device or reported for some other reason. In this document, a presentation also called a film or video presentation. A presentation can include audio. A single presentation can be contained in one or more files, with one file containing the metadata for the entire presentation. Metadata includes information, such as timing and frame data, descriptors, cursors, parameters, and other information that describes the presentation. Metadata does not include video and / or audio data per se. Files other than
Petition 870190049772, of 05/28/2019, p. 89/180
71/126 file containing metadata does not need to be formatted according to ISOBMFF and only needs to be formatted so that these files can be indicated by metadata.
[00135] The file structure of an ISO-based media file is object oriented, and the structure of an individual object in the file can be inferred directly from the type of the object. The objects in an ISO-based media file are referred to as boxes by the ISOBMFF specification. An ISO-based media file is structured as a sequence of boxes, which can contain other boxes. Boxes usually include a header that provides a size and type for the box. The size describes the entire size of the box, including the header, fields and all boxes contained within the box. Boxes with a type that is not recognized by a breeding device are typically ignored and skipped.
[00136] As illustrated by the example in Figure 7, at the top level of the file, an ISO 700-based media file can include a 710 file type box, a 720 film box and one or more 730a film fragment boxes, 730b ... 730n. Other boxes that can be included at this level, but that are not represented in this example include free space boxes, metadata boxes and media data boxes, among others.
[00137] An ISO-based media file can include a file type box 710, identified by the type of box ftyp. The 710 file box identifies
Petition 870190049772, of 05/28/2019, p. 90/180
72/126 an ISOBMFF specification that is most suitable for analyzing the file. More in this example means that the media file based on ISO 700 may have been formatted according to a particular ISOBMFF specification, but is probably compatible with other iterations of the specification. This more suitable specification is called the larger brand. A player device can use the larger tag to determine whether the device is capable of decoding and displaying the file contents. The 710 file box can also include a version number, which can be used to indicate a version of the ISOBMFF specification. The 710 file box can also include a list of compatible brands, which include a list of other brands that the file is compatible with. An ISO-based media file can support more than one major brand.
[00138] An ISO-based media file can additionally include a 720 film box that contains the metadata for the presentation. Film box 720 is identified as the type of moov box. ISO / IEC 14496-12 proposes that a presentation, whether contained in one file or in multiple files, can include only one 720 film box. Often, the 720 film box is close to the beginning of a media file based on ISO. The film box 720 includes a film header box 722 and can include one or more banner boxes 724 as well as other boxes.
[00139] The film header box 722, identified by the type of box mvhd, may include
Petition 870190049772, of 05/28/2019, p. 91/180
73/126 information that is media independent and relevant to the presentation as one. For example, the film header box 722 may include information, such as a creation time, a modification time, a time and / or a duration for the presentation, among other things. The film header box 722 can also include an identifier that identifies the next track in the presentation. For example, the identifier can point to the banner box 724 contained by the film box 720 in the illustrated example.
[00140] Banner box 724, identified by the type of trak box, can contain the information for a banner for presentation. A presentation can include one or more tracks, each track being independent of others in the presentation. Each track can include temporal and spatial information that is specific to the content on the track, and each track can be associated with a media box. The data in a range can be media data, in this case, the range is a media range, or the data can be packaging information for streaming protocols, in which case the range is a suggestion range. Media data includes, for example, video and audio data. In the illustrated example, the exemplary banner box 724 includes a banner header box 724a and a media box 724b. A banner box can include others, such as a banner reference box, a banner group box, an edit box, a user data box, a meta box, and others. As will be discussed in detail below,
Petition 870190049772, of 05/28/2019, p. 92/180
74/126 media box 724b may include signaling information from one or more ROIs.
[00141] The banner header box 724a, identified by the type of box tkhd, can specify the characteristics of a banner contained in banner box 724. For example, the banner header box 724a may include a creation time, time modification, duration, range identifier, layer identifier, group identifier, volume, width and / or height of the range, among other boxes. For a media track, the track header box 724a can additionally identify whether the track is enabled, whether the track should be played as part of the presentation, or whether the track can be used to predict the presentation, among other things. It is assumed that a presentation of a track is, in general, at the beginning of a presentation. Track box 724 may include an edit list box, not shown in this document, which may include a timeline map. The timeline map can specify, among other things, a travel time for the track, with the deviation indicating a start time, after the start of the presentation, for the track.
[00142] In the illustrated example, the banner box 724 also includes a media box 724b, identified by the type of medium box. The media box 724b can contain objects and information about the media data in the range. For example, the media box 724b may contain a processor reference box, which can identify the media type of the range and the process by which the media in the
Petition 870190049772, of 05/28/2019, p. 93/180
75/126 track is displayed. As another example, the media box 724b may contain a media information box that can specify the characteristics of the media in the range. The media information box can additionally include a sample table, with each sample describing a piece of media data (for example, video or audio data) including, for example, the location of the data for the sample. The data for a sample is stored in a media data box, discussed further below. As with most other boxes, the media box 724b can also include a media header box.
[00143] In the illustrated example, an exemplary media file based on ISO 700 also includes multiple fragments 730a, 730b,. . . 730n of the presentation. Fragments 730a, 730b,. . . 730n are not ISOBMFF boxes, however, they preferably describe a combination of boxes that include a 732 film fragment box and one or more 738 media data boxes that is indicated by the 732 film fragment box. The fragment box of film 732 and media data boxes 738 are from top-level boxes, but are grouped in the present context to indicate the relationship between a fragment of film 732 box and a media data box 738.
[00144] The movie fragment header box 734, identified by the box type mfhd, can include a sequence number A player can use the sequence number to verify that fragment 730a includes the next piece of data for presentation . In some cases, the contents of a file or
Petition 870190049772, of 05/28/2019, p. 94/180
76/126 files for a presentation can be provided to an out of order playback device. For example, packets can arrive in an order other than the order in which the packets were originally transmitted. In such cases, the sequence number can assist a reproductive device in determining the correct order for fragments.
[00145] The film fragment box 732 may also include one or more fragment boxes of track 736, identified by the type of traf box. A film fragment box 732 can include a set of track fragments, zero or more per track. Track fragments can contain zero or more track runs, each of which describes a contiguous run of
samples for a track. The fragments of track can be used to add time empty to a track in addition to add samples track. [00146] The box in Dice of media 738, identified by type of i Cashier ' 'mdat, contains data from
media. In video tracks, the media data box 738 contains video frames. A media data box can include, alternatively or additionally, audio data. A presentation can include zero or more media data boxes contained in one or more individual files. Media data is described by metadata. In the illustrated example, the media data in the media data box 738 can be described by metadata included in the strip fragment box 736. In other examples, the media data in a media data box can be described by
Petition 870190049772, of 05/28/2019, p. 95/180
77/126 metadata in the 720 film box. Metadata can refer to particular media data by an absolute deviation within the 700 file, so that header media data and / or the free space within the media data box 738 can be ignored.
[00147] Other fragments 730b, 730c, 730n in the media file based on ISO 700 may contain boxes similar to those illustrated for the first fragment 730a and / or may contain other boxes.
[00148] Figure 8 illustrates an example of an 840 media box that can be included in an ISO-based media file. As discussed above, a media box can be included in a banner box and can contain objects and information that describe media data in the banner. In the illustrated example, media box 840 includes a media information box 842. Media box 840 can also include other boxes, which are not illustrated in the present context.
[00149] The media information box 842 can contain objects that describe characteristic information about the media in the range. For example, the media information box 842 may include a data information box, which describes the location of media information in the range. As another example, the media information box 842 may include a video media header, when the track includes the video data. The video media header can contain general presentation information that is independent of the encoding of the video media. The media information box 842 also
Petition 870190049772, of 05/28/2019, p. 96/180
78/126 can include a sound media header when the track includes audio data.
[00150] The media information box 842 can also include a sample table box 844, as provided in the illustrated example. The sample table box 844, identified by the stbl box type, can provide locations (for example, locations with a file) for the media samples in the range, as well as time information for the samples. Using the information provided by the 844 sample table box, a breeding device can allocate samples in the correct order of time, determine the type of a sample and / or determine the size, container and displacement of a sample within a container Among other things.
[00151] The sample table box 844 can include a sample description box 846, identified by the type of box stsd. The sample description box 846 can provide detailed information about, for example, the type of encoding used for a sample and any initialized information needed for that type of encoding. The information stored in the sample description box can be specific to a type of range that includes the samples. For example, one format can be used for the sample description when the track is a video track and a different format can be used when the track is a suggestion track. As an additional example, the format for the sample description can also vary depending on the format of the suggestion range.
Petition 870190049772, of 05/28/2019, p. 97/180
79/126 [00152] The sample description box 846 can include sample input boxes 848a ... 848n. The sample entry is an abstract class, so typically the sample description box includes a specific sample entry box, such as a visual sample entry for video data or an audio sample entry for audio samples , among other examples. Each visual sample entry for video data can include one or more video frames. The video frames can be, for example, two-dimensional video frames 602a, 602b to 602n generated from the spherical representation 410. A sample input box can store the parameters for a particular sample. For example, for a video sample, the sample inbox can include a width, height, horizontal resolution, vertical resolution, frame count and / or depth for the video sample, among other things. As another example, for an audio sample, the sample input can include a channel count, a channel scheme and / or a sample rate, among other things.
[00153] In addition to the sample input boxes, the sample description 846 may additionally include a sample group description box 860 (identified by the type of sample group description box sgpd) and a sample for the sample box. group 862 (identified by the sample as the type of cash group sbgp). Both the sample group description box 860 and the sample for group group 862 can then be part of a sample clustering mechanism for
Petition 870190049772, of 05/28/2019, p. 98/180
80/126 to signal that a set of sample entries includes one or more ROIs and to signal the locations and dimensions of one or more ROIs in the set of sample entries. In the example of a Figure 8, sample group description box 860 can include a sample group type entry 861. Sample group type entry 861 can include a group type ROI to signal that the sample entry type includes ROI information. The sample group type entry 861 can additionally include syntax elements that indicate the pixel coordinates of the ROI in a two-dimensional video frame, as well as a yaw angle, a tilt angle, a delta yaw angle and an angle delta inclination of ROI in spherical space. The sample for group box 862 further indicates that the ROI information in the sample group type entry 861 should be applied to certain sample entries in sample description 846. With this information, the video samples containing the ROI can identified and delivered more efficiently to the renderer for rendering.
[00154] Some video systems support streaming media data over a network, in addition to supporting media playback. For example, one or more ISO-based media file format files (for example, ISOBMFF). The media file can include a movie presentation and can include suggestion tracks that contain instructions that can assist a streaming server in training for streaming the file or files as packages. These instructions can
Petition 870190049772, of 05/28/2019, p. 99/180
81/126 include, for example, data for the server to send (for example, header information) or reference to segments of the media data. A file can include separate suggestion ranges for different streaming protocols. Suggestion tracks can also be added to a file without the need to reformat the file.
[00155] Reference is now made to Figure 9 which illustrates an exemplary 900 system for continuous transmission. System 900 includes a server 902 and a client device 904 coupled communicatively to each other via the 906 network based on a network protocol. For example, server 902 can include a conventional HTTP web server, and a client device 904 can include a conventional HTTP client. An HTTP communication channel can be established, the client device 904 being able to transmit an HTTP request to server 902 to request one or more network resources. Server 902 can transmit an HTTP response back to client device 904 including the requested network resource (or requested network resources). An example of a network resource hosted by a 902 server can be media content, which can be divided into media segments. A media segment can include a sequence of video frames. The client device 904 may include a streaming application 908 to establish a streaming session with server 902 over the 906 network. During the streaming session, the
Petition 870190049772, of 05/28/2019, p. 100/180
82/126 streaming application 908 can transmit a request from one or more media segments to a request processor 910 from server 902 over network 906. The streaming application 908 can receive the one or more requested media segments and it can render some or all of the media segments received on the 904 client device before transmitting a subsequent request from other media segments. With the use of such HTTP streaming, the 908 streaming application does not have to wait until all media content has been downloaded before rendering the media content on the 904 client device, which can facilitate better use network resources and enhance the user experience.
[00156] In order to enable the continuous transmission of high quality media content using conventional HTTP web servers, adaptive bit rate continuous transmission can be used. With adaptive bit rate streaming, for each media segment, client device 904 can be provided with information about a set of alternate segment files 920 and 940. In the present context, a media segment can refer to a portion of a media bit stream associated with a particular timestamp and playback duration. Each set of alternate segment files 920 and 940 can correspond to a particular representation of the media segment (for example, associated with a particular time stamp and playback duration). A representation can
Petition 870190049772, of 05/28/2019, p. 101/180
83/126 refer to a particular result of encoding a given media content with different qualities (for example, with a different bit rate, frame rate or the like). Among each set of media segment files, each media segment file can be associated with a set of properties including, for example, a particular bit rate, frame rate, resolution, audio language or the like. Based on local information (for example, network bandwidth 906, decoding / display capabilities of client device 904, user preference or other information), the streaming application 908 can select a segment file for each representation media set. With an illustrative example, client device 904 can transmit a request for a media segment file that is associated with a first resolution of the 920 media segment files. Subsequently, due to a change in the 906 network bandwidth, client device 904 can transmit another request for a media segment file associated with a second resolution.
[00157] Information about the alternate segment file set 920 and 940 can be part of a description file 960 (or manifest file) maintained by server 902. Client device 904 can obtain description file 960 from the server 902 and can transmit requests for media segment files based on description file 960. The
Petition 870190049772, of 05/28/2019, p. 102/180
84/126 description 960 may include, for example, a list of a set of alternative media segment files for each representation of the media content, and the properties associated with each alternative media segment file (for example, bit rate , frame rate, resolution, audio language, etc.). The 960 description file can also include location identifiers (for example, Uniform Resource Locator (URL), Uniform Resource Indicator (URI) etc.) associated with the storage locations of the alternative media segment files.
[00158] There are several protocols for transmission
continues from rate of adaptive bits. a example is The Streaming To be continued Dynamic Adaptive through in Protocol of Hypertext Transfer (HTTP), or DASH (defined in ISO / IEC 23009-1: 2014). DASH, that too is
known as MPEG-DASH. According to DASH, the description file 960 may include a media presentation description (MPD). Figure 10 is a diagram that illustrates an example of an MPD 1001. In some cases, the MPD 1001 can be represented in Extensible Markup Language (XML). MPD 1001 can include a set of elements that define a set of adaptations 1002. The set of adaptations
1002 may include a set of alternative representations 1003 and 1004. A person of ordinary skill in the art will note that set of adaptations 1002 may include additional representations in addition to the representations
1003 and 1004. Each alternative representation 1003 and 1004 can be associated with a particular bit rate, resolution
Petition 870190049772, of 05/28/2019, p. 103/180
85/126 or other quality and may include a set of media segments. For example, representation 1003 includes media segments 1007 and 1009 and also header information 1005. Representation 1004 includes media segments 1008 and 1010 and also header information 1006. Header information 1005 and 1006 can include , for example, the representation element (for example, including identifier, bandwidth, width and height attributes, or the like). Each of the media segments 1007 and 1009 can be associated in MPD 1001 with a URL of a media segment file, which can be denoted as the SegmentURL element. Each of the set of elements in MPD 1001 can be associated with a set of attributes that define the properties, for example, of the set of adaptations 1002, representations 1003 and / or 1004 or other information.
[00159] The following is an example of part of an MPD:
<AdaptationSet mimeType = video / mp2t> <Representation id = 720p bandwidth = 3200000 width = 1280 height = 720>
<SegmentURL media = segment- l.DASH />
<SegmentURL media = segment-2.DASH>
[00160] In the exemplary MPD shown above, texts such as Period, AdaptationSet,
Representation, SegmentURL, etc. are elements, whereas mimeType, id, bandwidth, width and
Petition 870190049772, of 05/28/2019, p. 104/180
86/126 height, jmedia etc. they are attributes. In this example, the set of adaptations includes a representation associated with a particular bandwidth and frame size and includes a set of media segments represented by their URLs.
[00161] An MPD file can include signaling information for ROI. Reference is now made to Figure 11 which illustrates an XML code representation that illustrates an example of an MPD 1100. MPD 1100 can include a listing of at least one set of adaptations. In MPD 1100, a set of adaptations can include elements to define multiple alternative representations associated with different bit rates, resolutions or other qualities. Each representation can be associated with a picture file, and the MPD 1100 can include a link (for example, a universal resource locator (URL), a universal resource indicator (URI) or any other suitable information) to locate the file of figuration for each of the representation of representation. In a case that the picture file associated with a representation includes an ROI, the elements of representation may additionally include the first signaling information and the second signaling information associated with the ROI.
[00162] As shown, a set of adaptations is defined to include multiple representations, including the representation that has a representation ID equal to a representation that has a representation ID equal to 2. MPD 1100 indicates that the
Petition 870190049772, of 05/28/2019, p. 105/180
87/126 representation with the representation ID equal to 2 has a width of 3840 pixels, a height of 1920 pixels, a frame rate of 60 among other characteristics. MPD 1100 additionally includes a URL for the video file videol.mp4 for representation. An EssentialProperty element 1102 is provided for the representation with the representation ID equal to 2. The EssentialProperty element 1102 can describe information about the types of projection, FOV directions, mapping towards the region and / or other information. For example, this information can be contained in the MPD 1100 using EssentialProperty, in which case a different schemeldUri can be defined for each type of information. In an illustrative example, if schemeldUri urn: mpeg: dash: 360VideoProjection: 2017 is associated with a projection type and CMP means cube map projection, then you can define information about the cube map projection type on the element EssentialProperty according to the following: <EssentialProperty schemeIdUri = urn: mpeg: dash: 360 VideoProj ection: 2017 valueCMP />.
[00163] In addition, the SupplementalProperty element 1104 may contain signaling information for an ROI. For example, schemeldUri urn: mpeg: dash: ROIpixelrep: 2017 can be associated with a set of values to signal a central location and an ROI dimension in a two-dimensional frame. The location and dimension can be represented in pixel coordinates. In the example of a Figure 11, the
Petition 870190049772, of 05/28/2019, p. 106/180
88/126 central location of the ROI can be (1300, 500), which indicates that the left deviation from the central location is 1300 pixels, and the top deviation from the central location is 500 pixels. In addition, the ROI expands to a width of 100 pixels and a height of 200 pixels. Although in the example of Figure 11 the location and dimension are represented in pixel coordinates, it is understood that they can be represented in other forms, such as mosaics. For example, the location and dimension can be flagged by listing the tiles that include the ROI, or a group identifier associated with a tile group that includes the ROI.
[00164] In addition, schemeldUri urn: mpeg: dash: ROIsphererep: 2017 can be associated with a set of values to signal a central location and a dimension of an ROI in spherical space. In the example in Figure 11, the ROI yaw angle can be 20 radians, the ROI tilt angle can be 30 radians, the ROI delta tilt angle can be 10 radians, while the ROI yaw angle ROI can be 10 radians.
[00165] With MPD 1100, a system can obtain the video file videol.mp4 and decode the file based on the indication that the ROI is included in the video file. The system can also extract the pixels from the decoded file according to the signaling information and provide the extracted pixels to the renderer for rendering.
[00166] Figure 12 is a flow chart that illustrates
Petition 870190049772, of 05/28/2019, p. 107/180
89/126 is an example of a 1200 process for generating a media file. The process can be performed, for example, by a streaming server (for example, server 902 in Figure 9), by an intermediate network device between a host server and a receiving device etc., which encapsulates encoded data in a file media based on ISO (for example, an ISOBMFF file).
[00167] In 1202, process 1200 includes obtaining 360 degree video data, with 360 degree video data including a spherical representation of a scene. 360 degree video data can be generated by a set of cameras (for example, omnidirectional camera). The spherical representation can be formed, for example, by composing a set of images captured by the set of cameras at a particular moment in time.
[00168] In 1204, process 1200 includes determining a region of interest (ROI) in the spherical representation of the scene. The determination can be made based, for example, on an instruction to send a particular portion of the scene to a user (for example, as part of a director's cut), a user's viewing direction or based on other appropriate information . In some examples, ROI can be defined by at least four planes that intersect with spherical representation; each of the four planes also intersecting with the spherical center to form a large circle. For example, referring to Figure 5A, ROI can be defined by four large circles 502, 504, 506 and 508.
[00169] In 1206, process 1200 includes generating
Petition 870190049772, of 05/28/2019, p. 108/180
90/126 a media file that includes first signaling information and second signaling information from a viewing window region corresponding to the ROI, with the first signaling information including a central position and a dimension of the measured viewing window region in a spherical space associated with the spherical representation, and the second signaling information indicates a region of a picture that comprises the viewing window region. The figure can be formed by projecting the spherical representation that includes the ROI using the straight-line projection on a plane and can be a video frame. The preview window must be rendered in the view. In some examples, the first signaling information and the second signaling information can also define multiple viewport regions corresponding to multiple ROIs, and one of the multiple viewport regions can be selected for rendering in the display.
[00170] In some respects, the media file is based on a base media file format from the International Organization for Standardization (ISO) (ISOBMFF). The media file can identify a sample group that includes a video sample corresponding to the spherical video scene; and where the first signaling information and the second signaling information are included in one or more sample group syntax elements.
[00171] In some examples, the media file
Petition 870190049772, of 05/28/2019, p. 109/180
91/126 is based on a media presentation description (MPD) format and includes a list of one or more sets of adaptations. Each of the one or more sets of adaptations may include one or more representations. The first signaling information, the second signaling information and a link to the figure are included in one or more elements associated with the ROI included in one or more representations; In some examples, the one or more representations are mosaic-based representations, and the second signaling information includes identifiers associated with mosaics that include the ROI included in one or more mosaic-based representations.
[00172] In some respects, the first signaling information may include a first angle and a second angle of a center of the viewing window region with respect to a spherical center of the spherical representation of the scene, the first angle being formed in a first plane and the second angle is formed in a second plane, the first plane being perpendicular to the second plane. The first signaling information may additionally include a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region. The third angle can be formed between a first edge and a second edge of the viewing window region; and the fourth angle is formed between a third edge and a fourth edge of the viewing window region. For example, the first angle can be a yaw angle, the second angle can be a yaw angle
Petition 870190049772, of 05/28/2019, p. 110/180
92/126 inclination, whereas the third angle and the fourth angle can be, respectively, a delta yaw angle and a delta inclination angle, as described in Figure 4C, Figure 4D and Figure 4E.
[00173] In some examples, second signaling information may define one or more mosaics of the figuration that includes the viewing window region. The one or more mosaics can be part of a plurality of mosaics included in the figuration. In some respects, the second signaling information may include one or more coordinates associated with one or more mosaics in the picture. In some examples, the one or more mosaics form a group of mosaics, and the second signaling information may include a group identifier associated with the mosaic group. These mosaics can, for example, mosaics of limited movement.
[00174]
In some respects the second signaling information may include pixel coordinates associated with a predetermined location within a viewport region formed by projecting the ROI on a plane, a width of the viewport region and a height of the viewport region. viewing window.
[00175] In 1208 process 1200 additionally includes providing the media file to render the video data in 360 degrees or for transmitting a portion of the video data in 360 degrees that includes at least the ROI. Rendering may include, for example, obtaining a set of mosaics of the figuration based on the
Petition 870190049772, of 05/28/2019, p. 111/180
93/126 second signaling information and determine the location and limit of the viewing window within the tile set based on the first signaling information and extracting pixels corresponding to the viewing window based on the determined location and the limit to render the window viewing. The limit can also be determined based on a predetermined format of the viewing window. The format of the viewing window can be predetermined based, for example, on a determination that the ROI is defined by at least four planes that intersect with the spherical representation, each of the four planes also intersecting with the center spherical representation of the spherical representation and each forms a large circle. For example, as discussed above, ROI can be defined by four large circles 502, 504, 506 and 508, and the viewport can have the same shape as the viewport 520 of Figure 5C. Furthermore, the transmission of the 360 degree portion of the video data may include, for example, determining the set of tiles in the picture that includes the ROI and transmitting video data corresponding to the set of tiles to a renderer for rendering the ROI.
[00176] Figure 13 is a flow chart that illustrates an example of a 1300 process for processing a media file. The process can be carried out, for example, by an intermediate network device between a host server and a receiving device, a receiving device, etc.
Petition 870190049772, of 05/28/2019, p. 112/180
94/126 [00177] In 1302, process 1300 includes obtaining a media file associated with 360 degree video data. 360 degree video data can be generated by a set of cameras (for example, omnidirectional camera). The spherical representation can be formed, for example, by composing a set of images captured by the set of cameras at a particular moment in time. The media file can include first signaling information and second signaling information from a viewport region corresponding to a region of interest (ROI) in the spherical representation.
[00178] In some examples, ROI can be defined by at least four planes that intersect with spherical representation; each of the four planes also intersecting with the spherical center to form a large circle. For example, referring to Figure 5 A, ROI can be defined by four large circles 502, 504, 506 and 508.
[00179] In 1304, the 1300 process includes extracting pixels corresponding to the figure data display window based on the first signaling information and the second signaling information.
[00180] In some ways, the media file is based on a base media file format from
International Organization for Standardization (ISO) (ISOBMFF).
The media file can identify a sample group that includes a video sample corresponding to the spherical video scene; and where the first information from
Petition 870190049772, of 05/28/2019, p. 113/180
95/126 signaling and the second signaling information is included in one or more syntax elements of the sample group.
[00181] In some examples, the media file is based on a media presentation description (MPD) format and includes a list of one or more sets of adaptations. Each of the one or more sets of adaptations may include one or more representations. The first signaling information, the second signaling information and a link to the figure are included in one or more elements associated with the ROI included in one or more representations; In some examples, the one or more representations are mosaic-based representations, and the second signaling information includes identifiers associated with mosaics that include the ROI included in one or more mosaic-based representations.
[00182] In some respects, the first signaling information may include a first angle and a second angle of a center of the viewing window region with respect to a spherical center of the spherical representation of the scene, the first angle being formed in a first plane and the second angle is formed in a second plane, the first plane being perpendicular to the second plane. The first signaling information may additionally include a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region. The third angle can be formed between a first edge and a second edge of the
Petition 870190049772, of 05/28/2019, p. 114/180
96/126 viewing window; and the fourth angle is formed between a third edge and a fourth edge of the viewing window region. For example, the first angle can be a yaw angle, the second angle can be a tilt angle, while the third angle and the fourth angle can be a delta yaw angle and a delta tilt angle, respectively, as described in Figure 4C, Figure 4D and Figure 4E.
[00183] In some examples, the second signaling information may define one or more mosaics of the figuration that includes the viewing window region. The one or more mosaics can be part of a plurality of mosaics included in the figuration. In some respects, the second signaling information may include one or more coordinates associated with one or more mosaics in the picture. In some examples, the one or more mosaics form a group of mosaics, and the second signaling information may include a group identifier associated with the mosaic group. These mosaics can, for example, mosaics of limited movement.
[00184] In some respects, the second signaling information may include pixel coordinates associated with a predetermined location within a viewport region formed by projecting the ROI on a plane, a width of the viewport region and a height of the viewing window region.
[00185] In some examples, the extraction of the pixels may include identifying a set of mosaics in the
Petition 870190049772, of 05/28/2019, p. 115/180
97/126 figuration that contain the viewing window region and extract the pixels from the set of mosaics. The extraction of the pixels can additionally determine a location and a limit of the viewing window in the set of tiles. The location can be determined based on the yaw angle and the angle of inclination that indicates the central position of the viewing window region, whereas the limit can be determined based on the width and height indicated, respectively, by the delta angle yaw and delta tilt angle. The limit can also be determined based on a predetermined format of the viewing window region. The shape can be determined based on the ROI which is defined by at least four planes that intersect with the spherical representation, each of the four planes also intersecting with the spherical center of the spherical representation and form a large circle. For example, the format of the viewing window may be the same as the viewing window 520 of Figure 5C. The extraction of pixels can be based on the location and boundary of the viewing window region.
[00186] In 1306, the 1300 process additionally includes providing the extracted pixels to render the viewport region in a display.
[00187] In some examples, processes 1200 and
1300 can be performed by a computing device or device, such as system 100 shown in Figure 1.
In some examples, processes 1200 and 1300 can be performed by a file generation device, a
Petition 870190049772, of 05/28/2019, p. 116/180
98/126 file analysis or processing device, the encoding device 104 shown in Figure 1 and Figure 14, by another device on the video transmission side or video transmission device, by the decoding device 112 shown in Figure 1 and in Figure 15, and / or by another device on the client side, such as a reproducing device, a display or any other device on the client side. In one example, process 1200 can be performed by a file generating device, the encoding device 104 shown in Figure 1 and Figure 14, and / or by another device on the transmission side or video transmission device. In another example, process 1300 can be performed by an analysis or file processing device, the decoding device 112 shown in Figure 1 and Figure 15, and / or by another device on the client side, such as a reproducing device , a display or any other device on the client side. In some cases, the computing device or device may include a processor, microprocessor, microcomputer, or other component of a device that is configured to perform process steps 1200 and 1300. In some instances, the computing device or device may include a camera configured to capture video data (for example, a video sequence) that includes video frames. In some instances, a camera or other capture device that captures video data is separate from the computing device, in which case, the computing device receives or obtains data.
Petition 870190049772, of 05/28/2019, p. 117/180
99/126 video data captured. The computing device may additionally include a network interface configured to communicate video data. The network interface can be configured to communicate data based on Internet Protocol (IP) or other data. In some examples, the computing device or apparatus may include a display to display the output video content, such as sample pictures of a video bit stream.
[00188] Processes 1200 and 1300 are illustrated as logical flow diagrams whose operation represents a sequence of operations that can be implemented in hardware, computer instructions or a combination thereof. In the context of computer instructions, operations represent computer executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally speaking, executable instructions by computer include routines, programs, objects, components, data structures and the like that perform particular functions or implement particular data types. The order in which the operations are described should not be interpreted as a limitation and any number of the operations described can be combined in any order and / or in parallel to implement the processes.
[00189] Additionally, processes 1200 and
1300 can be performed under the control of one or more computer systems configured with instructions
Petition 870190049772, of 05/28/2019, p. 118/180
100/126 executables and can be deployed as code (for example, executable instructions, one or more computer programs or one or more applications) that are run collectively on one or more processors, by hardware or combinations thereof. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program that comprises a plurality of instructions executable by one or more processors. Computer-readable or machine-readable media can be transient.
[00190] The encoding techniques discussed in this document can be implemented in an example video encoding and decoding system (for example, the 100 system). In some examples, a system includes a source device that provides encoded video data to be decoded later by a destination device. In particular, the source device provides video data to the destination device via computer-readable media. The source device and the target device can comprise any of a wide variety of devices, including desktop computers, notebook computers (ie, laptop computers), tablet computers, bell decoders, phones, such as so-called smart phones, so-called smart pads, televisions, cameras, display devices, digital media players, game consoles
Petition 870190049772, of 05/28/2019, p. 119/180
101/126 electronic, continuous video transmission device or similar. In some cases, the source device and the destination device may be equipped for wireless communication.
[00191] The target device can receive the encoded video data to be decoded by means of the computer-readable media. Computer-readable media can comprise any type of media or device capable of moving encoded video data from the source device to the destination device. In one example, the computer-readable media may comprise a communication media to enable the source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol and transmitted to the destination device. The communication medium can comprise any wireless or wired communication medium, such as radio frequency (RF) spectrum or one or more physical transmission lines. The communication media can form part of a packet-based network, such as a local network, an extended network area, or a global network, such as the Internet. The communication media may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device to the destination device.
[00192] In some examples, the data
Petition 870190049772, of 05/28/2019, p. 120/180
102/126 encrypted can be sent from the send interface to a storage device. Similarly, encrypted data can be accessed from the storage device via the input interface. The storage device can include any of a variety of data storage media distributed or accessed locally, such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory or any other digital storage media suitable for storing encoded video data. In an additional example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device. The target device can access stored video data from the storage device via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device. Exemplary file servers include a web server (for example, for a website), an FTP server, networked storage devices (NAS), or a local disk drive. The target device can access encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.) or a
Petition 870190049772, of 05/28/2019, p. 121/180
103/126 combination of the two that is suitable for accessing encoded video data stored on a file server. The transmission of encrypted video data from the storage device may be a continuous transmission, a downloadable transmission transmission or a combination thereof.
[00193] The techniques of the present disclosure are not necessarily limited to wireless applications or configurations. The techniques can be applied to video encoding in support of any of a variety of multimedia applications, such as broadcast television broadcasts, cable television broadcasts, satellite television broadcasts, Internet video streams, such as such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium or other applications. In some instances, systems can be configured to support unidirectional or bidirectional video transmission to supporting applications, such as video streaming, video playback, video broadcasting and / or video-telephony.
[00194] In one example, the source device includes a video source, a video encoder and an output interface. The target device can include an input interface, a video decoder and a display device. The video encoder of the source device can be configured to apply the
Petition 870190049772, of 05/28/2019, p. 122/180
104/126 techniques disclosed in this document. In other examples, a source device and a target device can include other components or arrangements. For example, the source device can receive video data from an external video source, such as an external camera. Likewise, the target device can interface with an external display device, instead of including an integrated display device.
[00195] The example system above is just an example. Techniques for processing video data in parallel can be performed by any digital video encoding and / or decoding device. Although, in general, the techniques of the present disclosure are performed by a video encoding device, the techniques can also be performed by a video encoder / decodifleader, typically called CODEC. In addition, the techniques of the present disclosure can also be performed by a video preprocessor. The source device and the destination device are only from such encoding devices, the source device generating encoded video data for transmission to the destination device. In some examples, the source and destination devices may operate in a substantially symmetrical manner so that each of the devices includes video encoding and decoding components. Therefore, exemplary systems can support unidirectional or bidirectional video transmission between video devices, for example, for continuous video transmission,
Petition 870190049772, of 05/28/2019, p. 123/180
105/126 video playback, video broadcasting or video-telephony.
[00196] The video source may include a video capture device, such as a video camera, a video file that contains previously captured video and / or a video feed interface for receiving video from a video content provider. video. As an additional alternative, the video source can generate data based on computer graphics as the source video, or a combination of live video, video file and computer generated video. In some cases, if the video source is a video camera, the source device and the destination device can form so-called camera phones or video phones. However, as mentioned above, the techniques described in the present disclosure can be applicable to video encoding in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. Then, encoded video information can be output from the output interface on computer-readable media.
[00197] As verified, the computer-readable media may include transient media, such as a wireless or wired broadcast transmission or storage media (ie, non-transitory storage media), such as hard disk, flash drive , compact disc, digital video disc, Blu-ray disc or other computer-readable media. In some examples, a network server (not shown) can receive video data
Petition 870190049772, of 05/28/2019, p. 124/180
106/126 encrypted from the source device and provide the encrypted video data to the destination device, for example, via network transmission. Similarly, a computing device in a media production facility, such as a disc embossing facility, can receive encoded video data from the source device and produce a disk containing encoded video data. Therefore, it can be understood that computer-readable media includes one or more computer-readable media in various ways, in several examples.
[00198] The target device's input interface receives information from the computer-readable media. Computer-readable media information may include synthetic information defined by the video encoder, which is also used by the video decodifreader, which includes syntax elements that describe characteristics and / or processing of blocks and other encoded units, for example, group of figurations (GOP). A display device displays decoded video data for a user and can comprise a variety of display devices, such as cathode ray tubes (CRT), a liquid crystal display (LCD), a plasma display, a diode display organic light-emitting (OLED) or other types of display device. Various embodiments of the invention have been described.
[00199] The specific details of the encoding device 104 and the decoding device 112 are shown in Figure 14 and Figure 15, respectively. THE
Petition 870190049772, of 05/28/2019, p. 125/180
107/126
Figure 14 is a block diagram illustrating an example coding device 104 that can deploy one or more of the techniques described in the present disclosure. The coding device 104 can, for example, generate the syntactic structures described in this document (for example, the synthetic structures of a VPS, SPS, PPS or other elements of syntax). The encoding device 104 can perform intra-prediction and interpretation of video blocks within video slices. As previously described, intracoding depends, at least partially, on spatial prediction to reduce or remove spatial redundancy within a given video frame or picture. Intercoding depends, at least partially, on time prediction to reduce or remove time redundancy within adjacent or surrounding frames of a video sequence. Intramode (Mode I) can refer to any of the spatially based compression modes. Intermodes, such as unidirectional prediction (P mode) or biprediction (B mode), can refer to any one of several time-based compression modes.
[00200] The coding device 104 includes a partitioning unit 35, prediction processing unit 41, filter unit 63, figuration memory
64, adder 50, transform processing unit
52, quantification unit 54 and entropy coding unit 56. The prediction processing unit 41 includes movement estimate unit 42,
Petition 870190049772, of 05/28/2019, p. 126/180
108/126 motion compensation 44 and intraprediction processing unit 46. For video block reconstruction, encoding device 104 also includes reverse quantization unit 58, reverse transform processing unit 60 and adder 62. The filter unit 63 is intended to represent one or more cycle filters, such as an unblock filter, an adaptive cycle filter (ALF) and an adaptive sample bypass filter (SAO). Although filter unit 63 is shown in Figure 14 as a cycle filter, in other configurations, filter unit 63 can be deployed as a post-cycle filter. A post-processing device 57 can perform additional processing on encoded video data generated by the encoding device 104. The techniques of the present disclosure can, in some instances, be implemented by the encoding device 104. However, in other examples, one or more among the techniques of the present disclosure they can be implanted by the post-processing device 57.
[00201] As shown in Figure 14, the encoding device 104 receives video data, and the partitioning unit 35 partitions the data into video blocks. Partitioning can also include partitioning into slices, slice segments, mosaics or other larger units, as well as video block partitioning, for example, according to a quadratic tree structure of LCUs and CUs. The encoding device 104 generally illustrates the components that encode video blocks within a video slice to be encoded. The slice can
Petition 870190049772, of 05/28/2019, p. 127/180
109/126 be divided into multiple video blocks (and possibly sets of video blocks called mosaics). The prediction processing unit 41 can select one of a plurality of possible encoding modes, such as one of a plurality of intraprediction encoding modes or one of a plurality of interpreting encoding modes, for the current video block with based on error results (for example, encoding rate and level of distortion or the like). The prediction processing unit 41 can supply the resulting intrododed or intercodified block to adder 50 to generate residual block data and to adder 62 to reconstruct the encoded block for use as a reference figure.
[00202] The intraprediction processing unit 46 within the prediction processing unit 41 can perform intraprediction encoding of the current video block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to provide spatial compression. The motion estimation unit 42 and the motion compensation unit 44 within the prediction processing unit 41 perform interpretive encoding of the current video block in relation to one or more predictive blocks in one or more reference figures to provide compression temporal.
[00203] The movement estimation unit can be configured to determine the interpretation mode for a video slice according to a
Petition 870190049772, of 05/28/2019, p. 128/180
110/126 predetermined standard for a video sequence. The predetermined pattern can designate video slices in the sequence as P slices, B slices or GPB slices. The movement estimation unit 42 and the movement compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. The movement estimate, performed by the movement estimate unit 42, is the process of generating movement vectors, which estimates the movement for the video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) of a video block within a current video frame or picture in relation to a predictive block within a reference picture.
[00204] A predictive block is a block that is faithfully compatible with the PU of the video block to be encoded in terms of pixel difference, which can be determined by the sum of the absolute difference (SAD), sum of the quadratic difference (SSD ) or other difference metrics. In some examples, the coding device 104 can calculate values for pixel positions of the entire subnumber of reference figures stored in the figuration memory 64. For example, the coding device 104 can interpolate values of pixel positions of a quarter, positions of an eighth pixel or other fractional pixel positions of the reference figure. Therefore, the motion estimation unit 42 can perform a motion search in relation to the complete pixel positions and fractional pixel positions and issue an a
Petition 870190049772, of 05/28/2019, p. 129/180
111/126 motion vector with fractional pixel precision.
[00205] The motion estimation unit 42 calculates a motion vector for a PU of a video block in an intercodified slice by comparing the position of the PU to the position of a predictive block of a reference figure. The reference figure can be selected from a first list of reference figures (List 0) or a second list of reference figures (List 1), among which each identifies one or more reference figures stored in the memory of Figure 64. The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit 44.
[00206] The movement compensation, performed by the movement compensation unit 44 may involve obtaining or generating the predictive block based on the movement vector determined by the movement estimate, possibly performing interpellations for subpixel precision. Upon receipt of the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in a list of reference figures. The encoding device 104 forms a residual video block by subtracting pixel values from the predictive block from the pixel values of the current video block that is encoded, forming pixel difference values. The pixel difference values form residual data for the block and
Petition 870190049772, of 05/28/2019, p. 130/180
112/126 can include difference components of both luma and chroma. The adder 50 represents the component or components that perform this subtraction operation. The motion compensation unit 44 can also generate syntax elements associated with the video blocks and the video slice for use by the decoding device 112 in decoding the video blocks of the video slice.
[00207] The intraprediction processing unit 46 can intrapredict a current block, as an alternative to the interpretation performed by the movement estimation unit 42 and the movement compensation unit 44, as described above. In particular, the intraprediction processing unit 46 can determine an intraprediction mode to be used to encode a current block. In some examples, the intraprediction processing unit 46 may encode a current block using various intraprediction modes, for example, during separate encoding passages, and intraunit prediction processing 46 may select a suitable intraprediction mode for use from the tested modes. For example, intraprediction processing unit 46 can calculate rate distortion values using rate distortion analysis for the various tested intraprediction modes and can select the intraprediction mode that has the best rate distortion characteristics. among the tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between a coded block and an original uncoded block that has been
Petition 870190049772, of 05/28/2019, p. 131/180
113/126 encoded to produce the encoded block, as well as a bit rate (i.e., a number of bits) used to produce the encoded block. The intraprediction processing unit 46 can calculate distortion ratios and rates for the various encoded blocks to determine which intraprediction mode exhibits the best rate of distortion for the block.
[00208] In any case, after selecting an intraprediction mode for a block, the intraprediction processing unit 46 can provide information indicative of the intraprediction mode selected for the block to the entropy coding unit 56. The entropy coding unit 56 can encode information that indicates the selected intraprediction mode. The encoding device 104 may include in the transmitted bitstream configuration the data settings of encoding contexts for various as well as indications of a more likely intraprediction mode, an intraprediction mode index table and a mode index table of modified intrapredition to be used for each one of the contexts. The bitstream configuration data may include a plurality of intraprediction mode index tables and a plurality of modified intrapredition mode index tables (also called codeword mapping tables).
[00209] After the prediction processing unit 41 generates the predictive block for the current video block by means of either interpretation or intraprediction, the encoding device 104 forms a video block
Petition 870190049772, of 05/28/2019, p. 132/180
114/126 residual by subtracting the predictive block from the current video block. The residual video data in the residual block can be included in one or more TUs and applied to the transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform. , such as a discrete cosine transform (DCT) or a conceptually similar transform. The transform processing unit 52 can convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
[00210] The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantification can be modified by adjusting a quantification parameter. In some examples, the quantization unit 54 can then scan the matrix that includes the quantized transform coefficients. Alternatively, the entropy coding unit 56 can perform the scan.
[00211] After quantification, the entropy coding unit 56 entropy codes the quantized transform coefficients. For example, the entropy coding unit 56 can perform the
Petition 870190049772, of 05/28/2019, p. 133/180
115/126 context-adaptive variable-length encoding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), probability interval partitioning entropy (PIPE) encoding or another entropy coding technique. Following entropy coding by the entropy coding unit 56, the encoded bit stream can be transmitted to the decoding device 112 or file for later transmission or retrieval by the decoding device 112. The entropy coding unit 56 can also encode entropy the motion vectors and other syntax elements for the current video slice that is encoded.
[00212] The inverse quantization unit 58 and the inverse transform processing unit 60 apply inverse quantization and inverse transformation, respectively, in order to reconstruct the residual block in the pixel domain for later use as a reference block of a figuration of reference. The movement compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference figures within a list of reference figures. The motion compensation unit 44 may also apply one or more interpellation filters to the reconstructed residual block to calculate integer sub-number pixel values for use in the motion estimate. Adder 62 adds the reconstructed residual block to the prediction block of
Petition 870190049772, of 05/28/2019, p. 134/180
116/126 compensated for movement produced by the movement compensation unit 44 in order to produce a reference block for storage in the figuration memory 64. The reference block can be used by the movement estimate unit 42 and the movement compensation unit movement 44 as a reference block to interpret a block in a subsequent video frame or picture.
[00213] In this way, the encoding device 104 of Figure 14 represents an example of a video encoder configured to derive LIC parameters, adaptively determine mold sizes and / or adaptively select weights. The coding device 104 can, for example, derive LIC parameters, adaptively determine the sizes of the molds and / or adaptively select the weight sets, as described above. For example, the coding device 104 can perform any of the techniques described in this document, including the processes described above with respect to Figure 12 and Figure 13. In some cases, some of the techniques of that disclosure may also be implemented by device post-processing 57.
[00214] Figure 15 is a block diagram illustrating an example decoding device 112. Decoding device 112 includes an entropy decoding unit 80, prediction processing unit 81, reverse quantization unit 86, processing unit reverse transform 88, adder 90,
Petition 870190049772, of 05/28/2019, p. 135/180
117/126 filter unit 91 and figure memory 92. Prediction processing unit 81 includes movement compensation unit 82 and intraprediction processing unit 84. Decoding devices 112 may, in some instances, perform a pass-through decoding generally reciprocates the encoding pass with respect to the encoding device 104 of a Figure 14.
[00215] During the decoding process, the decoding device 112 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements by the encoding device 104. In some embodiments, the device decoder 112 may receive the encoded video bit stream from the encoding device 104. In some embodiments, the decoding device 112 may receive the encoded video bit stream from a network entity 79, such as a server, an element network-aware network (MANE), a video editor / splitter or other such device configured to deploy one or more of the techniques described above. Network entity 79 may or may not include encoding device 104. Some of the techniques described in the present disclosure may be implemented by network entity 79 before network entity 79 transmits the encoded video bit stream to decoding device 112 In some video decoding systems, network entity 79 and decoding device 112 may be part of
Petition 870190049772, of 05/28/2019, p. 136/180
118/126 separate devices, whereas in other instances, the functionality described with respect to network entity 79 can be performed by the same device comprising the decoding device 112.
[00216] The entropy decoding unit 80 of the decoding device 112 entropy decodes the bit stream to generate quantized coefficients, motion vectors and other syntax elements. The entropy decoding unit 80 forwards the motion vectors and other syntax elements to the prediction processing unit 81. The decoding device 112 can receive the syntax elements at the video slice level and / or at the video block level . The entropy decoding unit 80 can process and analyze both fixed-length syntax elements and variable element syntax elements in one or more sets of parameters, such as a VPS, SPS and PPS.
[00217] When the video slice is encoded as an intra-coded slice (I), the intraprediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the current video slice based on a signaled intraprediction mode and in the block data previously decoded from the current frame or figuration. When the video frame is encoded as an intercoded slice (i.e., Β, P or GPB), the motion compensation unit 82 of the prediction processing unit 81 produces predictive blocks for a block of
Petition 870190049772, of 05/28/2019, p. 137/180
119/126 video of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 80. The predictive blocks can be produced from one of the reference figures within a list of figures of reference. The decoding device 112 can build the lists of reference frames, List 0 and List 1, using standard construction techniques based on reference figures stored in the figuration memory 92.
[00218] Motion compensation unit 82 determines prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements and uses the prediction information to produce the predictive blocks for the block of current video that is decoded. For example, motion compensation unit 82 can use one or more elements of syntax in a set of parameters to determine a prediction mode (for example, intraprediction or interpredition) used to encode the video blocks of the video slice, a slice type interpretation (for example, slice B, slice P or GPB slice), construction information for one or more reference figure lists for the slice, motion vectors for each slice's intercodified video block, interpretation situation for each intercodified video block in the slice and other information to decode the video blocks in the current video slice.
[00219] The unit of compensation of
Petition 870190049772, of 05/28/2019, p. 138/180
120/126 movement 82 can also perform interpellation based on interpellation filters. The motion compensation unit 82 can use interpellation filters, as used by the encoding device 104 during the encoding of the video blocks to calculate the interpellated values for pixels of the entire subnumber of reference blocks. In that case, the motion compensation unit 82 can determine the interpellation filters used by the encoding device 104 of the received syntax elements and can use the interpellation filters to produce predictive blocks.
[00220] The inverse quantization unit 86 quantifies inversely, or decantifies, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 80. The inverse quantization process may include the use of a parameter of quantification calculated by the coding device 104 for each video block in the video slice to determine a degree of quantification and, likewise, a degree of inverse quantization that must be applied. The reverse transform processing unit 88 applies a reverse transform (for example, a reverse DCT or other suitable reverse transform), a reverse integer transform or a conceptually similar reverse transform process to the transform coefficients in order to produce blocks residuals in the pixel domain.
[00221] After the movement compensation unit 82 generates the predictive block for the movement block
Petition 870190049772, of 05/28/2019, p. 139/180
121/126 current video based on motion vectors and other syntax elements, decoding device 112 forms a decoded video block by adding the residual blocks of the reverse transform processing unit 88 to the corresponding predictive blocks generated by the unit movement compensation 82. The adder 90 represents the component or components that perform this sum operation. If desired, cycle filters (either in the encoding cycle or after the encoding cycle) can also be used to smooth out pixel transitions or otherwise improve the quality of the video. Filter unit 91 is intended to represent one or more cycle filters, such as an unblock filter, an adaptive cycle filter (ALF) and an adaptive sample bypass filter (SAO). Although filter unit 91 is shown in Figure 15 as a cycle filter, in other configurations, filter unit 91 can be deployed as a post-cycle filter. Then, the video blocks decoded in a given frame or picture are stored in the picture memory 92, which stores the reference pictures used for subsequent motion compensation. Figure memory 92 also stores decoded video for later display on a display device, such as the video target device 122 shown in Figure 1.
[00222] Thus, the decoding device 112 of Figure 15 represents an example of a video decoder configured to derive parameters
Petition 870190049772, of 05/28/2019, p. 140/180
122/126 of LIC, adaptively determine mold sizes and / or selectively adapt weights. The decoding device 112 can, for example, derive LIC parameters, adaptively determine the sizes of the molds and / or adaptively select the weight sets, as described above. For example, the decoding device 112 can perform any of the techniques described in this document, including the processes described above with respect to Figure 12 and Figure 13.
[00223] In the aforementioned description, aspects of the present application are described with reference to the specific modalities of the same, however those skilled in the art will recognize that the present invention is not limited to them. Thus, although the illustrative modalities of the present application have been described in detail in this document, it is to be understood that the concepts of the invention may otherwise be incorporated and employed in various ways and that the attached claims are intended to be interpreted to include such variations, except when limited by the prior art. Various features and aspects of the invention mentioned above can be used individually or together. In addition, the modalities can be used in any number of environments and applications other than those described in this document without departing from the spirit and broader scope of this report. This report and the drawings should, therefore, be considered as illustrative
Petition 870190049772, of 05/28/2019, p. 141/180
123/126 and not restrictive. By way of illustration, the methods have been described in a particular order. It should be noted that in alternative modalities, the methods can be performed in a different order than described.
[00224] When the components are described as being configured to perform certain operations, such configuration can be obtained, for example, by designing electronic circuits or other hardware to perform the operation, programming the programmable electronic circuits (for example, microprocessors) or other suitable electronic circuits) to perform the operation or any combination thereof.
[00225] The various logic blocks, modules, circuits and steps of illustrative algorithms described in combination with the modalities revealed in this document can be implemented as electronic hardware, computer software, firmware or combinations thereof. In order to clearly illustrate this interchangeability of hardware and software, several components, blocks, modules, circuits and illustrative steps have been described above in general terms in terms of their functionality. The possibility of the functionality being implemented as hardware or software depends on the particular application and the model limitations imposed on the general system. Persons skilled in the art can deploy the described functionality in a variety of ways for each particular application, however such deployment decisions should not be interpreted as causing a departure from the scope of the present invention.
Petition 870190049772, of 05/28/2019, p. 142/180
124/126
[00226] The techniques described at the gift document can also be deployedhardware electronic , computer software, firmware or any
combination thereof. Such techniques can be deployed on various devices, such as general purpose computers, wireless communication device sources or integrated circuit devices that have multiple uses including application to wireless communication device sources and other devices. Any features described as modules or components can be deployed together in an integrated logic device or separately as separate but interoperable logic devices. If implemented in software, the techniques can be implemented at least in part by a computer-readable data storage medium that comprises program code that includes instructions that, when executed, perform one or more of the methods described above. Computer-readable data storage media may form part of a computer program product, which may include packaging materials. Computer-readable media may comprise memory or data storage media, such as random access memory (RAM), such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-random access memory volatile (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media and the like. Additionally or alternatively, the techniques
Petition 870190049772, of 05/28/2019, p. 143/180
125/126 can be achieved, at least partially, by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read and / or executed by a computer, such as like propagated signals or waves.
[00227] The program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application-specific integrated circuits (ASICs), arrays field programmable logic devices (FPGAs) or other integrated or separate equivalent logic circuitry. Such a processor can be configured to perform any of the techniques described in the present disclosure. A general purpose process can be a microprocessor; however, alternatively, the processor can be any conventional processor, controller, microcontroller or state machine. A processor can also be deployed as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in combination with a DSP core, or any other such configuration. Consequently, the term processor, as used in this document, can refer to any of the aforementioned structure, any combination of the aforementioned structure, or any structure or apparatus suitable for implementing the techniques.
Petition 870190049772, of 05/28/2019, p. 144/180
126/126 described in this document. In addition, in some respects, the functionality described in this document may be provided within dedicated software modules or hardware modules configured for encoding and decoding or embedded in a combined video encoder (CODEC).

权利要求:
Claims (13)
[1]
1. Method for processing video data, the method comprising:
obtain a media file associated with 360 degree video data, with 360 degree video data including a spherical representation of a scene, the media file includes first signaling information and second signaling information from a window region display corresponding to a region of interest (ROI) in the spherical representation, the first signaling information including a central position and a viewing window region and a height and width of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information identifies Cartesian coordinates of a region of a figure comprising the viewing window region, the figure being formed by projecting the spherical representation that includes the ROI on a plane;
extract pixels corresponding to the region of the visualization data of the figuration data based on the first signaling information and the second signaling information; and provide the pixels to render the viewport region for display.
[2]
2. Method according to claim 1, the first signaling information including a first angle and a second angle of a center of the viewing window region with respect to a spherical center
Petition 870190049772, of 05/28/2019, p. 7/180
2/13 of the spherical representation of the scene, with the first angle being formed in the foreground and the second angle being formed in the background, where the foreground is perpendicular to the background.
[3]
Method according to claim 1, wherein the first signaling information additionally includes a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region.
[4]
A method according to claim 3, wherein the third angle is formed between a first edge and a second edge of the viewing window region; and where the fourth angle is formed between a third edge and a fourth edge of the viewing window region.
[5]
5. Method according to claim 2, in which the ROI is defined by at least four planes that intersect with the spherical representation; and where each of the four planes also intersects with the spherical center.
[6]
A method according to claim 5, which further determines a shape of the viewing window region based on the intersection of the at least four planes with the spherical representation.
[7]
Method according to claim 6, in which the pixels corresponding to the viewing window region are extracted based on the format.
[8]
A method according to claim 1, wherein the figure includes a plurality of mosaics;
the second signaling information defining one or more mosaics of the figuration that include the viewing window region;
Petition 870190049772, of 05/28/2019, p. 8/180
3/13 where the method additionally comprises:
obtaining one or more mosaics from the plurality of mosaics based on the second signaling information; and extract the pixels from the one or more mosaics.
[9]
A method according to claim 8, wherein the second signaling information includes one or more coordinates associated with one or more mosaics in the picture.
[10]
A method according to claim 8, wherein the one or more mosaics form a group of mosaics, and wherein the second signaling information includes a group identifier associated with the mosaic group.
[11]
A method according to claim 8, wherein the plurality of mosaics are mosaics of limited movement.
[12]
12. The method of claim 1, wherein the second signaling information includes pixel coordinates associated with a predetermined location within a viewport region formed by projecting the ROI onto a plane, across a region width. viewport and at a height of the viewport region.
13. Method, of a deal with The claim 1, in that the media file is based in a base format in fileNormalization Organization media(ISO) (ISOBMFF). International in 14. Method, of a deal with The claim 13, in
that the media file identifies a sample group that includes a video sample corresponding to the representation
Petition 870190049772, of 05/28/2019, p. 9/180
Spherical 4/13 of the scene; and where the first signaling information and the second signaling information are included in one or more sample group syntax elements.
15. The method of claim 1, wherein:
the media file is based on a media presentation description (MPD) format and includes one or more sets of adaptations;
each of the one or more sets of adaptations includes one or more representations; and the first signaling information, the second signaling information and a link to the figure are included in one or more elements associated with the ROI included in one or more representations;
and in which the method comprises: obtaining the figure based on the link included in the media file.
16. The method of claim 15, wherein the one or more representations are mosaic-based representations, and the second signaling information includes identifiers associated with mosaics that include the ROI included in the mosaic-based representations.
17. Method, according to claim 1, in which the spherical representation of the scene is projected onto the plane using a rectilinear projection.
18. Method according to claim 1, which further comprises: extracting pixels from multiple ROIs of the figure based on the first signaling information and the second signaling information.
Petition 870190049772, of 05/28/2019, p. 10/180
5/13
19. Apparatus for processing video data comprising:
a memory configured to store 360-degree video data; and a processor configured to:
obtain a media file associated with 360-degree video data, with 360-degree video data including a spherical representation of a scene, the media file includes first signaling information and second signaling information for a window region display corresponding to a region of interest (ROI) in the spherical representation, the first signaling information including a central position of the viewing window region and a height and width of the viewing window region measured in an associated spherical space the spherical representation, and the second signaling information that identifies Cartesian coordinates of a region of a figure that comprises the viewing window region, the figure being formed by projecting the spherical representation that includes the ROI in a plane;
extract pixels corresponding to the region of the visualization data of the figure based on the first signaling information and the second signaling information; and provide the pixels to render the viewport region for display.
20. Apparatus according to claim 19, the processor being additionally configured for:
determine, from the first information of
Petition 870190049772, of 05/28/2019, p. 11/180
6/13 signaling, a first angle and a second angle of a center of the viewing window region with respect to a spherical center of the spherical representation of the scene, the first angle being formed in the foreground and the second angle being formed in a background, where the foreground is perpendicular to the background.
21. Apparatus according to claim 19, the processor being additionally configured for:
determine, from the first signaling information, a third angle associated with a width of the viewing window region and a fourth angle associated with a height of the viewing window region.
Apparatus according to claim 20, wherein the third angle is formed between a first edge and a second edge of the viewing window region; and where the fourth angle is formed between a third edge and a fourth edge of the viewing window region.
23. Apparatus according to claim 19, wherein the ROI is defined by at least four planes that intersect with the spherical representation; and where each of the four planes also intersects with the spherical center.
Apparatus according to claim 23, wherein the processor is further configured to determine a shape of the viewing window region based on the intersection of the at least four planes with the spherical representation.
25. Apparatus according to claim 24, wherein the processor is configured to extract the pixels corresponding to the viewing window region based on the format.
Petition 870190049772, of 05/28/2019, p. 12/180
7/13
26. Apparatus according to claim 19, wherein the figuration includes a plurality of mosaics;
wherein the second signaling information defines one or more mosaics of the figuration that include the viewing window region;
where the processor is additionally configured to:
obtaining one or more mosaics from the plurality of mosaics based on the second signaling information; and extract the pixels from the one or more mosaics.
27. Apparatus according to claim 26, wherein the processor is additionally configured for
to determine, The leave Monday information in signaling, an or more coordinates associated with the one or more mosaics28. in figurationDevice of according to claim 26,
where the one or more tiles form a tile group, and where the processor is further configured to determine, from the second signaling information, a group identifier associated with the tile group.
29. Apparatus according to claim 26,
where the plurality of mosaics are mosaics in movement limited. 30 . Device of a deal with the claim 19, where the p rocessador is configured additionally for to determine, starting second information in signaling, coordinates of pixel associated with an location predetermined within a window region
Petition 870190049772, of 05/28/2019, p. 13/180
8/13 of visualization formed by projecting the ROI on a plane, a width of the viewing window region and a height of the viewing window region.
31. Apparatus according to claim 19, in which the media file is based on a base media file format of the International Organization for Standardization (ISO) (ISOBMFF).
32. Apparatus according to claim 31, wherein the media file identifies a sample group that includes a video sample corresponding to the spherical representation of the scene; and wherein the processor is further configured to extract the first signaling information and the second signaling information from one or more syntax elements of the sample group.
33. Apparatus according to claim 19, wherein:
the media file is based on a media presentation description (MPD) format and includes one or more sets of adaptations;
each of the one or more sets of adaptations includes one or more representations; and where the processor is additionally configured to:
determine, based on one or more elements associated with the ROI included in one or more representations, the first signaling information, the second signaling information and a link for the figuration; and obtain the figuration based on the link included in the media file.
Petition 870190049772, of 05/28/2019, p. 14/180
9/13
34. Apparatus according to claim 33, wherein the one or more representations are representations based on a mosaic, and where the processor is configured to determine, based on the second signaling information, identifiers associated with mosaics that include the ROI included in mosaic-based representations.
35. Apparatus according to claim 19, in which the spherical representation of the scene is projected onto the plane using a rectilinear projection.
36. Apparatus according to claim 19, wherein the processor is additionally configured to extract pixels from multiple ROIs of the figure based on the first signaling information and the second signaling information.
37. Apparatus according to claim 19, wherein the apparatus comprises a mobile device with one or more cameras for capturing video data in 360 degrees.
38. Apparatus according to claim 19, wherein the apparatus comprises a display for rendering the viewing window region.
39. Non-transitory, computer-readable media that have stored in the same instructions that, when executed by one or more processors, make the one or more processor:
obtain a media file associated with 360-degree video data, whereas 360-degree video data includes a spherical representation of a scene, the media file includes first
Petition 870190049772, of 05/28/2019, p. 15/180
10/13 signaling and second signaling information for a viewing window region corresponding to a region of interest (ROI) in the spherical representation, the first signaling information including a central position of the viewing window region and a height and a width of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information identifies Cartesian coordinates of a region of a picture that comprises the viewing window region, and the picture is formed by projecting it whether the spherical representation that includes the ROI in a plan;
extract pixels corresponding to the region of the visualization data of the figure based on the first signaling information and the second signaling information; and provide the pixels to render the viewport region for display.
40. Method for processing video data, the method comprising:
obtain 360 degree video data, the 360 degree video data including a spherical representation of a scene;
determine a region of interest (ROI) in the spherical representation of the scene;
generate a media file that includes first signaling information and second signaling information from a viewing window region corresponding to the ROI, with the first signaling information including a central position of the window region
Petition 870190049772, of 05/28/2019, p. 16/180
11/13 of visualization and a height and width of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information that identifies Cartesian coordinates of a region of a figuration comprising the window region of visualization, in which the figuration is formed by projecting the spherical representation that includes the ROI in a plane; and providing the media file for rendering the 360 degree video data or for transmitting a portion of the 360 degree video data that includes at least the ROI.
41. Apparatus for processing video data comprising:
a memory configured to store 360 degree video data; and a processor configured to:
obtain 360 degree video data, the 360 degree video data including a spherical representation of a scene;
determine a region of interest (ROI) in the spherical representation of the scene;
generate a media file that includes first signaling information and second signaling information from a viewport region corresponding to the ROI, with the first signaling information including a central position of the viewport region and a height and width of the viewing window region measured in a spherical space associated with the spherical representation, and the second
Petition 870190049772, of 05/28/2019, p. 17/180
12/13 signaling information identifies Cartesian coordinates of a region of a figure comprising the viewing window region, in which the figure is formed by projecting the spherical representation that includes the ROI on a plane; and providing the media file for 360-degree video data rendering or for transmission of a portion of the 360-degree video data that includes at least ROI.
42. Non-transitory computer-readable media that have stored in the same instructions that, when executed by one or more processors, induce the one or more processor to:
obtain 360 degree video data, with 360 degree video data including a spherical representation of a scene;
determine a region of interest (ROI) in the spherical representation of the scene;
generate a media file that includes first signaling information and second signaling information for a viewport region corresponding to the ROI, the first signaling information including a central position of the viewport region and a height and width of the viewing window region measured in a spherical space associated with the spherical representation, and the second signaling information identifies Cartesian coordinates of a region of a picture that comprises the viewing window region, in which the picture is formed projecting from spherical representation that includes the
Petition 870190049772, of 05/28/2019, p. 18/180
[13]
13/13
ROI on a piano; and providing the media file for rendering the 360 degree video data or for transmitting a portion of the 360 degree video data that includes at least the ROI.

类似技术:

公开号 | 公开日 | 专利标题

KR102348538B1|2022-01-07|Method and system for processing 360 degree video data

KR102204178B1|2021-01-15|Systems and methods of signaling of regions of interest

US10620441B2|2020-04-14|Viewport-aware quality metric for 360-degree video

US10643301B2|2020-05-05|Adaptive perturbed cube map projection

JP6676771B2|2020-04-08|Storage of virtual reality video in media files

KR20190008223A|2019-01-23|Circular fisheye video in virtual reality

US20180276890A1|2018-09-27|Advanced signalling of regions of interest in omnidirectional visual media

US11062738B2|2021-07-13|Signalling of video content including sub-picture bitstreams for video coding

KR102185811B1|2020-12-03|Enhanced signaling of regions of interest of container files and video bitstreams

BR112019013871A2|2020-03-03|PERFECTED RESTRICTED SCHEME DESIGN FOR VIDEO

US20220078486A1|2022-03-10|An apparatus, a method and a computer program for video coding and decoding

US20210314626A1|2021-10-07|Apparatus, a method and a computer program for video coding and decoding

WO2021047820A1|2021-03-18|An apparatus, a method and a computer program for video coding and decoding

同族专利:

公开号 | 公开日

JP2020501436A|2020-01-16|

TWI712313B|2020-12-01|

TW201824865A|2018-07-01|

CN110024400A|2019-07-16|

CN110024400B|2021-08-24|

KR20190091275A|2019-08-05|

US10652553B2|2020-05-12|

EP3552394A1|2019-10-16|

JP6799159B2|2020-12-09|

US20180160123A1|2018-06-07|

KR102204178B1|2021-01-15|

WO2018106548A1|2018-06-14|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP2003141562A|2001-10-29|2003-05-16|Sony Corp|Image processing apparatus and method for nonplanar image, storage medium, and computer program|

US7103212B2|2002-11-22|2006-09-05|Strider Labs, Inc.|Acquisition of three-dimensional images by an active stereo technique using locally unique patterns|

US9344612B2|2006-02-15|2016-05-17|Kenneth Ira Ritchey|Non-interference field-of-view support apparatus for a panoramic facial sensor|

US9025933B2|2010-02-12|2015-05-05|Sony Corporation|Information processing device, information processing method, playback device, playback method, program and recording medium|

CN103096014B|2011-10-28|2016-03-30|华为技术有限公司|A kind of video presentation method and system|

WO2015009676A1|2013-07-15|2015-01-22|Sony Corporation|Extensions of motion-constrained tile sets sei message for interactivity|

US10721530B2|2013-07-29|2020-07-21|Koninklijke Kpn N.V.|Providing tile video streams to a client|

US9497358B2|2013-12-19|2016-11-15|Sony Interactive Entertainment America Llc|Video latency reduction|

WO2015197818A1|2014-06-27|2015-12-30|Koninklijke Kpn N.V.|Hevc-tiled video streaming|

KR101953679B1|2014-06-27|2019-03-04|코닌클리즈케 케이피엔 엔.브이.|Determining a region of interest on the basis of a hevc-tiled video stream|

US10291561B2|2015-02-09|2019-05-14|Nokia Technologies Oy|Apparatus, a method and a computer program for image coding and decoding|

JP6566698B2|2015-04-13|2019-08-28|キヤノン株式会社|Display control apparatus and display control method|

US10977764B2|2015-12-29|2021-04-13|Dolby Laboratories Licensing Corporation|Viewport independent image coding and rendering|

WO2017203098A1|2016-05-24|2017-11-30|Nokia Technologies Oy|Method and an apparatus and a computer program for encoding media content|

US10360721B2|2016-05-26|2019-07-23|Mediatek Inc.|Method and apparatus for signaling region of interests|

WO2018068213A1|2016-10-10|2018-04-19|华为技术有限公司|Video data processing method, and apparatus|

US10917564B2|2016-10-12|2021-02-09|Qualcomm Incorporated|Systems and methods of generating and processing files for partial decoding and most interested regions|US10560660B2|2017-01-04|2020-02-11|Intel Corporation|Rectilinear viewport extraction from a region of a wide field of view using messaging in video transmission|

US10742999B2|2017-01-06|2020-08-11|Mediatek Inc.|Methods and apparatus for signaling viewports and regions of interest|

WO2018131813A1|2017-01-10|2018-07-19|Samsung Electronics Co., Ltd.|Method and apparatus for generating metadata for 3d images|

US10560680B2|2017-01-28|2020-02-11|Microsoft Technology Licensing, Llc|Virtual reality with interactive streaming video and likelihood-based foveation|

US10643301B2|2017-03-20|2020-05-05|Qualcomm Incorporated|Adaptive perturbed cube map projection|

US20190373245A1|2017-03-29|2019-12-05|Lg Electronics Inc.|360 video transmission method, 360 video reception method, 360 video transmission device, and 360 video reception device|

US10506255B2|2017-04-01|2019-12-10|Intel Corporation|MV/mode prediction, ROI-based transmit, metadata capture, and format detection for 360 video|

WO2019006336A1|2017-06-30|2019-01-03|Vid Scale, Inc.|Weighted to spherically uniform psnr for 360-degree video quality evaluation using cubemap-based projections|

US11202117B2|2017-07-03|2021-12-14|Telefonaktiebolaget Lm Ericsson |Methods for personalized 360 video delivery|

US10217488B1|2017-12-15|2019-02-26|Snap Inc.|Spherical video editing|

US20190385372A1|2018-06-15|2019-12-19|Microsoft Technology Licensing, Llc|Positioning a virtual reality passthrough region at a known distance|

US11032590B2|2018-08-31|2021-06-08|At&T Intellectual Property I, L.P.|Methods, devices, and systems for providing panoramic video content to a mobile device from an edge server|

US10826964B2|2018-09-05|2020-11-03|At&T Intellectual Property I, L.P.|Priority-based tile transmission system and method for panoramic video streaming|

US10779014B2|2018-10-18|2020-09-15|At&T Intellectual Property I, L.P.|Tile scheduler for viewport-adaptive panoramic video streaming|

US11184461B2|2018-10-23|2021-11-23|At&T Intellectual Property I, L.P.|VR video transmission with layered video by re-using existing network infrastructures|

US11190786B2|2019-09-24|2021-11-30|At&T Intellectual Property I, L.P.|Transcoding ultra-high-definition panoramic videos|

CN110618398B|2019-09-24|2020-09-29|深圳市拜洛克科技有限公司|Method for controlling luminescence of silk sticks based on UWB positioning technology|

CN112511866A|2019-12-03|2021-03-16|中兴通讯股份有限公司|Media resource playing and text rendering method, device, equipment and storage medium|

CN112055263B|2020-09-08|2021-08-13|西安交通大学|360-degree video streaming transmission system based on significance detection|

CN113470127B|2021-09-06|2021-11-26|成都国星宇航科技有限公司|Optical image effective compression method based on satellite-borne cloud detection|

法律状态:
2021-10-05| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201662431375P| true| 2016-12-07|2016-12-07|

US15/828,281|US10652553B2|2016-12-07|2017-11-30|Systems and methods of signaling of regions of interest|

PCT/US2017/064349|WO2018106548A1|2016-12-07|2017-12-01|Systems and methods of signaling of regions of interest|

[返回顶部]